# Running AI Model Inference

Once you have loaded an AI model and obtained a model handle, you can start running inferences. The [degirum.model.Model](https://docs.degirum.com/pysdk/api-ref/model#degirum.model.model) class provides two methods for performing AI inference:

* [degirum.model.Model.predict()](https://docs.degirum.com/pysdk/api-ref/model#degirum.model.model.predict): Runs prediction on a single data frame.
* [degirum.model.Model.predict\_batch()](https://docs.degirum.com/pysdk/api-ref/model#degirum.model.model.predict_batch): Runs prediction on a batch of frames.

### Model.predict()

The `predict()` method takes a single input data frame and returns an inference result object. You can also call this method by calling `degirum.model.Model.__call__`. This is an alias for `Model()`. See [Single Frame Inference](#single-frame-inference) for more information.

{% hint style="info" %}
Try not to run `predict()` on videos, webcam feeds, and streams. Instead, use [predict\_batch()](#model.predict_batch).
{% endhint %}

{% code overflow="wrap" %}

```python
# Method Signature: Model.predict()
degirum.model.Model.predict(data)
```

{% endcode %}

**Example:**

{% code overflow="wrap" %}

```python
import degirum as dg

# Declaring variables
# Set your model, inference host address, model zoo, and token in these variables.
your_model_name = "model-name"
your_host_address = "@cloud" # Can be "@cloud", host:port, or "@local"
your_model_zoo = "degirum/public"
your_token = "<token>"

# Specify the image you will run inference on
your_image = "path/image.jpg"

# Loading a model
model = dg.load_model(
    model_name = your_model_name, 
    inference_host_address = your_host_address, 
    zoo_url = your_model_zoo, 
    token = your_token 
    # optional parameters, such as overlay_show_probabilities = True
)

# Run a prediction and assign it to result
result = model(your_image)

# Print the prediction result
print(result)
```

{% endcode %}

Example output:

{% code overflow="wrap" %}

```
- bbox: [240.37627136118888, 101.09044216232718, 898.4129315085123, 698.4668477562271]
  category_id: 15
  label: cat
  score: 0.86873459815979
```

{% endcode %}

### Model.predict\_batch()

The `predict_batch()` method accepts an iterator of data frames, such as a list or a stream, and returns a generator. It processes the iterator in a pipeline to maximize throughput, making it more efficient than calling `predict()` repeatedly in a loop. This approach is ideal for processing a list of images or a video stream. See the [Batch Inference](#batch-inference) section for more information.

{% code overflow="wrap" %}

```python
# Method Signature: Model.predict_batch()
degirum.model.Model.predict_batch(data)
```

{% endcode %}

## Supported Input Data Types

PySDK models can handle images and raw tensors as data types.

The input you pass to `predict()` depends on the number of inputs the model has. If the model has one input, then you pass only one object to `predict()`.

{% hint style="warning" %}
The model may have multiple inputs. In this case, the data you pass to `predict()` is a list of objects: one object per corresponding input.
{% endhint %}

#### Check Input Data Type of Your Model

You can check what input type your model expects by inspecting the `model.model_info.InputType` property.

{% code overflow="wrap" %}

```python
import degirum as dg

# Declaring variables
# Set your model, inference host address, model zoo, and token in these variables.
your_model_name = "model-name"
your_host_address = "@cloud" # Can be "@cloud", host:port, or "@local"
your_model_zoo = "degirum/public"
your_token = "<token>"

# Loading a model
model = dg.load_model(
    model_name = your_model_name, 
    inference_host_address = your_host_address, 
    zoo_url = your_model_zoo, 
    token = your_token 
    # optional parameters, such as overlay_show_probabilities = True
)

# Print input data supported by your model.
print(model.model_info.InputType)
```

{% endcode %}

Example output:

{% code overflow="wrap" %}

```
['Image']
```

{% endcode %}

In this example, we check the input data type of our model.

The `InputType` property of the `ModelParams` class returned by the [degirum.model.Model.model\_info](https://docs.degirum.com/pysdk/api-ref/model#degirum.model.model.model_info) property describes the number and the type of inputs of the model (see [Model Info](#model-info) section for details about model info properties).

The `model_info.InputType` property returns a list of input types (one entry per model input). The length of this list tells you how many separate inputs the model expects. For instance, a model that takes two images will have two entries in this list.

### Images

If your model expects image inputs (`InputType == "Image"`), you can supply the input frame in any of the following formats:

* Path to an image file.
* HTTP URL to an image.
* NumPy array.
* PIL `Image` object.
* Raw `bytes` of image data.

PySDK automatically converts these inputs into the format required by the model’s neural network according to the model’s preprocessor settings in its JSON configuration. For more details, see the [preprocessor parameters](https://docs.degirum.com/pysdk/model-json-structure#preprocessing-parameters).

### Tensors

If your model expects raw tensor inputs (`InputType == "Tensor"`), you should provide a multi-dimensional NumPy array with the appropriate shape and data type.

The array’s dimensions must match the model’s expected input shape, which you can find in the model info (`model.model_info.InputShape`). The data type of the array’s elements should match the model’s expected raw data type (`model.model_info.InputRawDataType`).

### Audio

If your model expects audio inputs (`InputType == "Audio"`), provide a one-dimensional NumPy array containing audio waveform samples. The waveform length and sampling rate must match `model.model_info.InputWaveformSize` and `model.model_info.InputSamplingRate`.

{% hint style="info" %}
Whisper encoder models in PySDK support both `NHWC` and `NCHW` feature layouts. The audio preprocessor inspects `InputTensorLayout` and the non-unit dimensions of `model.model_info.InputShape` so it can reorder mel bins and time frames automatically for the layout your model requires.
{% endhint %}

## Single Frame Inference

When you want to process one frame, use `predict()`.

{% code overflow="wrap" %}

```python
import degirum as dg
import cv2

# Declaring variables
# Set your model, inference host address, model zoo, and token in these variables.
your_model_name = "model-name"
your_host_address = "@cloud" # Can be "@cloud", host:port, or "@local"
your_model_zoo = "degirum/public"
your_token = "<your-token>"

# Specify the image you will run inference on
your_image = "path/image.jpg"

# Loading a model
model = dg.load_model(
    model_name = your_model_name, 
    inference_host_address = your_host_address, 
    zoo_url = your_model_zoo, 
    token = your_token 
    # optional parameters, such as overlay_show_probabilities = True
)

# Run a prediction and assign it to result
result = model(your_image)

# Print the prediction result
print(result)
```

{% endcode %}

Example output:

{% code overflow="wrap" %}

```
- bbox: [242.46119793154423, 110.32875074982861, 898.4762465007998, 698.5679996357975]
  category_id: 15
  label: cat
  score: 0.86873459815979
```

{% endcode %}

## Batch Inference

When you have multiple frames to process, use `predict_batch()`. The `predict_batch()` method runs predictions on an iterable list of frames. The predictions run in a pipeline to maximize throughput, making it more efficient than calling `predict()` in a loop.

The `predict_batch()` method accepts a single parameter: an iterator object, for example, a list. Populate this iterator with the same types of data you pass to `predict()`, such as image paths, image URLs, NumPy arrays, or PIL `Image` objects.

`predict_batch()` returns a generator of results. You can loop over these results just as you would iterate through successive `predict()` calls.

In addition to raw frames, the iterator can yield two-element tuples. The first element must be the frame, and the second element can contain any metadata for that frame. This metadata is passed through the inference pipeline and becomes available via the `info` property of the corresponding result. This makes it easy to attach per-frame context such as timestamps or frame numbers.

Because `predict_batch()` returns a generator, simply calling the method does not immediately run inference. Frames are processed only when you iterate over the returned generator (for example, in a `for` loop).

#### Example: Iterating over predict\_batch results

{% code overflow="wrap" %}

```python
for result in model.predict_batch(['image1.jpg','image2.jpg']):
    print(result)
```

{% endcode %}

#### **Example: Attaching frame metadata**

{% code overflow="wrap" %}

```python
def frame_source():
    for idx, path in enumerate(['image1.jpg', 'image2.jpg']):
        yield path, {'frame_index': idx}

for result in model.predict_batch(frame_source()):
    print(result.info)
```

{% endcode %}

#### Example: Using `predict_batch()` on a video file

This example uses `predict_batch()` to process a video file. The `frame_source` generator yields frames from the video, and the model produces predictions for each frame. The results are overlaid on the frame (`result.image_overlay`) and displayed with OpenCV.

{% code overflow="wrap" %}

```python
import degirum as dg
import cv2

# Declaring variables
# Set your model, inference host address, model zoo, and token in these variables.
your_model_name = "model-name"
your_host_address = "@cloud" # Can be "@cloud", host:port, or "@local"
your_model_zoo = "degirum/public"
your_token = "<your-token>"

# Specify the video you will run inference on
your_video = "path/video.mp4"

# Loading a model
model = dg.load_model(
    model_name = your_model_name, 
    inference_host_address = your_host_address, 
    zoo_url = your_model_zoo, 
    token = your_token 
    # optional parameters, such as overlay_show_probabilities = True
)

# Open the video file
stream = cv2.VideoCapture(your_video)

# Define generator function to produce video frames
def frame_source(stream):
    while True:
      ret, frame = stream.read()
      if not ret:
         break # end of file
      yield frame

# Run predict_batch() on frames from the video file
for result in model.predict_batch(frame_source(stream)):
    # Print raw results for each frame
    print(result)

# Release stream
stream.release()
```

{% endcode %}

Example output:

{% code overflow="wrap" %}

```
- bbox: [293.7637429237366, 230.0834002494812, 474.48908519744873, 621.3431906700134]
  category_id: 0
  label: person
  score: 0.7896590232849121
```

{% endcode %}

In the example above, the results are continuously printed to the terminal until the video is complete.

## Inference Results

When you call `predict()`, you receive an inference result object derived from the [degirum.postprocessor.InferenceResults](https://docs.degirum.com/pysdk/api-ref/postprocessor#degirum.postprocessor.inferenceresults) class. Likewise, `predict_batch()` returns a generator that yields inference result objects. These result classes, known as postprocessors, vary by AI model type—classification, object detection, pose detection, segmentation, and so on. From your perspective, they provide the same functionality.

`InferenceResults` objects contain the following data:

* [degirum.postprocessor.InferenceResults.image](https://docs.degirum.com/pysdk/api-ref/postprocessor#degirum.postprocessor.inferenceresults.image): Original input image as a NumPy array or PIL image.
* [degirum.postprocessor.InferenceResults.image\_overlay](https://docs.degirum.com/pysdk/api-ref/postprocessor#degirum.postprocessor.inferenceresults.image_overlay): Original image with inference results drawn on top. The overlay is model-dependent:
  * Classification models: class labels with probabilities are *printed below* the original image.
  * Object detection models: bounding boxes are *printed on* the original image.
  * Hand and pose detection models: keypoints and keypoint connections are *printed on* the original image.
  * Segmentation models: segments are *printed on* the original image.
* [degirum.postprocessor.InferenceResults.results](https://docs.degirum.com/pysdk/api-ref/postprocessor#degirum.postprocessor.inferenceresults.results): Keeps a list of inference results in dictionary form. Follow the property link for detailed explanation of all result formats.
* [degirum.postprocessor.InferenceResults.image\_model](https://docs.degirum.com/pysdk/api-ref/postprocessor#degirum.postprocessor.inferenceresults.image_model): Preprocessed image tensor that was fed into the model (in binary form). Populated only if you enable [Model.save\_model\_image](https://docs.degirum.com/pysdk/api-ref/model#degirum.model.model.save_model_image) before performing predictions.

The results property is what you will typically use in your code. This property contains the core prediction data. Note that if the model outputs coordinates (e.g., bounding boxes), these have been converted back to the coordinates of the original image for your convenience.

#### Example: Combine predict\_batch() with image\_overlay to show prediction results on original video

This example will open your video, run inference on it, ad display the video with inference results annotated over it.

{% code overflow="wrap" %}

```python
import degirum as dg
import cv2

# Declaring variables
# Set your model, inference host address, model zoo, and token in these variables.
your_model_name = "model-name"
your_host_address = "@cloud" # Can be "@cloud", host:port, or "@local"
your_model_zoo = "degirum/public"
your_token = "<token>"

# Specify the video you will run inference on
your_video = "path/video.mp4"

# Loading a model
model = dg.load_model(
    model_name = your_model_name, 
    inference_host_address = your_host_address, 
    zoo_url = your_model_zoo, 
    token = your_token 
    # optional parameters, such as overlay_show_probabilities = True
)

# Open the video file
stream = cv2.VideoCapture(your_video)

# Define generator function to produce video frames
def frame_source(stream):
    while True:
        ret, frame = stream.read()
        if not ret:
            break # end of file
        yield frame

# Process the video frames in a batch and display the overlay
for result in model.predict_batch(frame_source(stream)):
    # Retrieve the overlay; if it's callable, call it
    overlay = result.image_overlay() if callable(result.image_overlay) else result.image_overlay

    # Display the overlay image in a window
    cv2.imshow("Inference Overlay", overlay)

    # Wait 1ms; press 'q' to quit early
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the video stream and close all OpenCV windows
stream.release()
cv2.destroyAllWindows()
```

{% endcode %}
