Skip to content

Running AI Model Inference

Once you loaded an AI model and obtained model handling object, you can start doing AI inferences on your model. The following methods of degirum.model.Model class are available to perform AI inferences:

The predict() and __call__ methods behave exactly the same way (actually, __call__ just calls predict()). They accept single argument - input data frame, perform AI inference of that data frame, and return inference result - an object derived from degirum.postprocessor.InferenceResults superclass.

The batch prediction methods, predict_batch() and predict_dir(), perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling predict() method in a loop. These methods are described in details in Batch Inferences section.

Cloud Server Connection Issues

For cloud inferences the connection to the cloud server is performed at the beginning of each predict-style method call, and disconnection is performed at the end of that call. This greatly reduces performance since the cloud server connection/disconnection is relatively long activities. To overcome this problem you may use the model object inside the with block. When predict-style method is called inside the with block, the disconnection is not performed at the end of such call, so consecutive predict call does not perform reconnection as well, thus saving execution time.

The following code demonstrates the approach:

   # here model_name variable stores the model name
   # and data_list variable stores the list of input data frames to process

   # wrap the load_model() call in with block to avoid disconnections
   with zoo.load_model(model_name) as model:
      # perform prediction loop
      for data in data_list:
         result = model.predict(data)

Input Data Handling

PySDK model prediction methods support different types of input data. An exact input type depends on the model to be used. The following input data types are supported:

  • image input data
  • audio input data
  • raw tensor input data

The input data object you supply to model predict methods also depends on the number of inputs the model has. If the model has single data input, then the data objects you pass to model predict methods are single objects. IN rare cases the model may have multiple data inputs; in this case the data objects you pass to model predict methods are lists of objects: one object per corresponding input.

The number and the type of inputs of the model are described by the InputType property of the ModelParams class returned by degirum.model.Model.model_info property (see Model Info section for details about model info properties). The InputType property returns the list of input data types, one type per model input. So the number of model inputs can be deduced by evaluating the length of the list returned by the InputType property.

The following sections describe details of input data handling for various model input types.

Image Input Data Handling

When dealing with model inputs of image type (InputType is equal to "Image"), the PySDK model prediction methods accept a wide variety of input data frame types:

  • the input frame can be the name of a file with frame data;
  • it can be the HTTP URL pointing to a file with frame data;
  • it can be a numpy array with frame data;
  • it can be a PIL Image object;
  • it can by bytes object containing raw frame data.

An AI model requires particular input tensor dimensions and data type which, in most of the cases, does not match the dimensions of the input frame. In this case, PySDK performs automatic conversion of the input frame to the format compatible with AI model input tensor, performing all the necessary conversions such as resizing, padding, colorspace conversion, and data type conversion.

PySDK performs input frame transformations using one of the two graphical packages (called backends): PIL or OpenCV. The backend is selected by degirum.model.Model.image_backend property. By default it is set to auto, meaning that OpenCV backend will be used first, and if it is not installed, then PIL backend will be used. You may explicitly select which backend to use by assigning either "pil" or "opencv" to degirum.model.Model.image_backend property.

Note: In case of OpenCV backend, you cannot pass PIL Image objects to model predict methods.

If your input frame is in the file on a local filesystem, or is accessible through HTTP protocol, pass the filename string or URL string directly to model predict methods: PySDK will (down-)load the file, decode it, and convert to the model input tensor format. The set of supported graphical file formats is defined solely by the graphical backend library you selected, PIL or OpenCV - PySDK does not perform any own decoding.

Sometimes, image conversion to AI model input tensor format requires image resizing. This resizing can be done in two possible ways:

  • preserving the aspect ratio;
  • not preserving the aspect ratio.

In addition, the image can be cropped or not cropped.

You can control the way of image resizing by degirum.model.Model.input_pad_method property, which can have one of the following values: "stretch", "letterbox", "crop-first", and "crop-last".

When you select the "stretch" method, the input image is resized exactly to the AI model input tensor dimensions, possibly changing the aspect ratio.

When you select the "letterbox" method (default way), the image is resized to fit the AI model input tensor dimensions while preserving the aspect ratio. The voids which can appear on the image sides are filled with the color specified by degirum.model.Model.input_letterbox_fill_color property (black by default).

When you select the "crop-first" method, the image is first cropped to match the AI model input tensor aspect ratio with respect to the degirum.model.Model.input_crop_percentage and then resized to match the AI model input tensor dimensions. For example: an input image with dimensions 640x480 going into a model with input tensor dimensions 224x224 with crop percentage of 0.875 will first be center cropped to 420x420 (420 = min(640, 480) * 0.875) and then resized to 224x224.

When you select "crop-last" method, if the AI model input tensor dimensions are equal (square), the image is resized with its smaller side equal to the model dimension with respect to degirum.model.Model.input_crop_percentage. If the AI model input tensor dimensions are not equal (rectangle), the image is resized with stretching to the input tensor dimensions with respect to degirum.model.Model.input_crop_percentage. The image is then cropped to fit the AI model input tensor dimensions and aspect ratio. For example: an input image with dimensions 640x480 going into a model with input tensor dimensions of 224x224 with crop percentage of 0.875 will first be resized to 341x256 (256 = 224 / 0.875 and 341 = 256 * 640 / 480) and then center cropped to 224x224. Alternatively an input image with dimensions 640x480 and a model with input tensor dimensions 280x224 will be resized to 320x256 (320 = 280 / 0.875 and 256 = 224 / 0.875) and then center cropped to 280x224.

You can specify the resize algorithm in the degirum.model.Model.input_resize_method property, which may have the following values: "nearest", "bilinear", "area", "bicubic", or "lanczos". These values specify various interpolation algorithms used for resizing.

In case your input frames are stored in numpy arrays, you may need to tell PySDK the order of colors in those numpy arrays: RGB or BGR. This order is called the colorspace. By default, PySDK treats numpy arrays as having BGR colorspace for OpenCV graphical backend and RGB colorspace for PIL graphical backend. So if your numpy arrays as such, then no additional action is needed from your side. But if your numpy arrays have color order opposite to default, then you need to change degirum.model.Model.input_numpy_colorspace property.

Note: If a model has multiple image inputs, the PySDK applies the same input_*** image properties as discussed above for every image input of a model.

Audio Input Data Handling

When dealing with model inputs of audio type (InputType is equal to "Audio"), PySDK does not perform any conversions of the input data: it expects numpy 1-D array with audio waveform samples of proper size and with proper sampling rate. The waveform size should be equal to InputWaveformSize model info property. The waveform sampling rate should be equal to InputSamplingRate model info property. And finally the data element type should be equal to the data type specified by the InputRawDataType model info property. All aforementioned model info properties are the properties of the ModelParams class returned by degirum.model.Model.model_info property (see Model Info section for details).

Tensor Input Data Handling

When dealing with model inputs of raw tensor type (InputType is equal to "Tensor"), PySDK expects that you provide a multi-dimensional numpy array of proper dimensions. The dimensions of that array should match model input dimensions as specified by either InputShape model info property (new style models) or the by InputN, InputH, InputW, and InputC model model info properties (old style models), slowest dimension first. InputShape model info property supersedes other model info properties, so if it is defined, all others are ignored. If it is not defined, then the tensor shape is specified by the set of defined and non-zero InputN, InputH, InputW, and InputC model info properties in that order: InputN specifies the slowest dimension, InputC specifies the fastest dimension.

The numpy array data element type should be equal to the data type specified by the InputRawDataType model info property (see Model Info section for details).

Dynamic Inputs Handling

Certain AI models may have so-called dynamic inputs, which are supported by some of inference runtimes, for example, OpenVINO runtime (more details by this link).

In order to adjust the size of the input data, accepted by PySDK preprocessor, you need to assign the actual input data size/shape to be used for consecutive inferences before performing the inference.

If your model has image input type (InputType == "Image"), then you assign InputN, InputH, InputW, and InputC model info properties to match the size of images to be used for the inference. The PySDK preprocessor will resize the input images to assigned size. If input images already have that size, resizing step will be skipped. In any case, the inference runtime will receive the image of that size.

If your model has tensor input type (InputType == "Tensor"), then you assign InputShape model info property to match the shape of tensors to be used for the inference. Since PySDK does not do any resizing for tensor inputs, all tensors you pass for inferences must have the specified shape, so the inference runtime will receive the tensors of that shape.

To simplify input shape assignments, you may use degirum.model.Model.input_shape model property. This property allows unified access to model input size/shape model info properties: InputN, InputH, InputW, InputC, and InputShape regardless of the input type (image or tensor).

The getter returns and the setter accepts the list of input shapes, one shape per each model input. Each element of that list (which defines a shape for particular input) is another list containing input dimensions, slowest dimension first.

For each input, the getter returns InputShape value if InputShape model parameter is specified for the input, otherwise it returns [InputN, InputH, InputW, InputC].

The setter works symmetrically: it assigns the provided list to InputShape model info property, if it was specified for the model input, otherwise it assigns provided list to InputN, InputH, InputW, and InputC model info properties in that order (i.e. zero index to InputN and so forth).

Example: you have a dynamic-input model with single image-type input, and you want to perform an inference of a video stream frames with resolution specified by width and height variables, so that the dynamic-input model itself will do image handling, and no resizing will be done by PySDK. Then you need to perform the following assignment before performing any inference:

model.input_shape = [[1, height, width, 3]]

Inference Results

All model predict methods return result objects derived from degirum.postprocessor.InferenceResults class. These result classes are called post-processors. Particular post-processor class types depend on the AI model type: classification, object detection, pose detection, segmentation etc. But from the user point of view, they deliver identical functionality.

Result object contains the following data:

The results property is what you typically use for programmatic access to inference results. The type of results is always a list of dictionaries, but the format of those dictionaries is model-dependent. Also, if the result contains coordinates of objects, all such coordinates are recalculated from the model coordinates back to coordinates on the original image, so you can use them directly.

The image_overlay property is very handy for debugging and troubleshooting. It allows you to quickly assess the correctness of the inference results in graphical form.

There are result properties which affect how the overlay image is drawn:

When each individual result object is created, all these overlay properties (except overlay_fill_color) are assigned with values of similarly named properties taken from the model object (see Model Parameters section for the list of model properties). This allows assigning overlay property values only once and applying them to all consecutive results. But if you want to play with individual result, you may reassign any of overlay properties and then re-read image_overlay property. Each time you read image_overlay, it returns new image object freshly drawn according to the current values of overlay properties.

The overlay_color property is used to define the color to draw overlay details. In the case of a single RGB tuple, the corresponding color is used to draw all the overlay data: points, boxes, labels, segments, etc. In the case of a list of RGB tuples the behavior depends on the model type.

  • For classification models different colors from the list are used to draw labels of different classes.
  • For detection models different colors are used to draw labels and boxes of different classes.
  • For pose detection models different colors are used to draw keypoints of different persons.
  • For segmentation models different colors are used to highlight segments of different classes.

If the list size is less than the number of classes of the model, then overlay_color values are used cyclically, for example, for three-element list it will be overlay_color[0], then overlay_color[1], overlay_color[2], and again overlay_color[0].

The default value of overlay_color is a single RBG tuple of yellow color for all model types except detection and segmentation models. For detection and segmentation models it is the list of RGB tuples with the list size equal to the number of model classes. You can use Model.label_dictionary property to obtain a list of model classes. Each color is automatically assigned to look pretty and different from other colors in the list.

Note" overlay_fill_color is assigned with degirum.model.Model.input_letterbox_fill_color.

In some cases the set of existing PySDK post-processor classes is not enough. This happens, for example, when you want to work with new AI model, and PySDK does not support this AI model yet. In this case you may implement your own custom post-processor and use it instead of standard PySDK post-processors. The Custom Post Processors for Inference Results section describes this in detail.

Results Filtering

By default, all results are reported by the model predict methods. However, you may want to include only results which belong to certain categories: either have certain class labels or category IDs. To achieve that, you can specify a set of class labels (or, alternatively, category IDs) so only inference results, which class labels (or category IDs) are found in that set, are reported, and all other results are discarded. You assign such a set to degirum.model.Model.output_class_set property.

For example, you may want to include only results with class labels "car" and "truck":

# allow only results with "car" and "truck" class labels
model.output_class_set = {"car", "truck"}

Or you may want to include only results with category IDs 1 and 3:

# allow only results with 1 and 3 category IDs
model.output_class_set = {1, 3}

This category filtering is applicable only to models which have "label" (or "category_id") keys in their result dictionaries. For all other models this category filter will be ignored.

Note On degirum.postprocessor.DetectionResults Segmentation Mask Format

The degirum.postprocessor.DetectionResults class expects the value at the optional "mask" key in an object detection model's result dictionary to hold a dictionary representing the run-length encoded (RLE) object segmentation mask array. This dictionary contains the following keys:

  • "height": height of segmentation mask array
  • "width": width of segmentation mask array
  • "data": string representation of a buffer of unsigned 32-bit integers carrying the RLE segmentation mask array.

The "data" field is obtained using the following algorithm:

  1. The initial mask array, flattened in row-major order, is broken up into sub-arrays, each of which has elements of only one value.
  2. The value in each sub-array is stored in one array and the length of each sub-array is stored in another array.
  3. The resulting two arrays are concatenated - values, then lengths.
  4. The final array is cast as an unsigned 32-bit integer array, interpreted as a buffer, encoded using base64, and decoded as an ASCII string.

The algorithm to convert the dictionary with the RLE segmentation mask array to the original array is the following:

  1. The string is decoded using base64 and interpreted as an array of unsigned 32-bit integers.
  2. The resulting array is split into two equal-sized arrays: the first array holds the value present in each sub-array of the original flattened array, and the second array holds the length of each sub-array.
  3. The original flattened array is reconstructed using the two arrays.
  4. The resulting array is reshaped according to the values at the "height" and "width" keys in the dictionary.

Batch Inferences

If you need to process multiple frames using the same model and the same settings, the most effective way to do it is to use batch prediction methods of `degirum.model.Model' class:

Both methods perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling predict() method in a loop.

Both methods return the generator object, so you can iterate over inference results. This allows you to directly use the result of batch prediction methods in for-loops, for example:

for result in model.predict_batch(['image1.jpg','image2.jpg']):
   print(result)

Note: Since batch prediction methods return generator object, simple assignment of batch prediction method result to some variable does not start any inference. Only iterating over that generator object does.

The predict_batch method accepts single parameter: an iterator object, for example, a list. You populate your iterator object with the same type of data you pass to regular predict(), i.e. input image path strings, input image URL string, numpy arrays, or PIL Image objects (in case of PIL image backend).

The predict_dir method accepts a filepath to a directory containing graphical files for inference. You may supply optional extensions parameter passing the list of file extensions to process.

The following minimal example demonstrates how to use batch predict to perform AI inference from the video file:

import degirum as dg
import cv2

# connect to cloud model zoo, load model, set model properties needed for OpenCV
zoo = dg.connect(dg.CLOUD, "https://cs.degirum.com", token="<your cloud API access token>")
model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")
model.image_backend = "opencv"
stream = cv2.VideoCapture("images/Traffic.mp4") # open video file

# define generator function to produce video frames
def frame_source(stream):
   while True:
   ret, frame = stream.read()
      if not ret:
         break # end of file
      yield frame

# run batch predict on stream of frames from video file
for result in model.predict_batch(frame_source(stream)):
   # show annotated frames
   cv2.imshow("Demo", res.image_overlay)   

stream.release()

Model Parameters

The model behavior can be controlled with various Model class properties, which define model parameters. They can be divided into the following categories:

  • parameters, which control how to handle input frames;
  • parameters, which control the inference;
  • parameters, which control how to display inference results;
  • parameters, which control model run-time behavior and provide access to model information

The following table provides complete summary of Model class properties arranged by categories.

Property Name Description Possible Values Default Value
Input Handling Parameters
image_backend package to be used for image processing "auto", "pil", or "opencv"
"auto" tries OpenCV first
"auto"
input_letterbox_fill_color image fill color in case of 'letterbox' padding 3-element tuple of RGB color (0,0,0)
input_numpy_colorspace colorspace for numpy arrays "auto", "RGB" or "BGR" "auto"
input_pad_method how input image will be padded when resized "stretch", "letterbox", "crop-first", or "crop-last" "letterbox"
input_crop_percentage a percentage of input image dimension to retain when "input_pad_method" is set to "crop-first" or "crop-last" Float value in [0..1] range 1.0
input_resize_method interpolation algorithm for image resizing "nearest", "bilinear", "area", "bicubic", "lanczos" "bilinear"
input_shape input shape list of input shapes, one shape per each model input; each element is another list containing input dimensions, slowest dimension first [[]]
save_model_image flag to enable/disable saving of model input image in inference results Boolean value False
Inference Parameters
output_class_set List of class labels or category IDs to be included in inference results Set of strings or set of integers {}
output_confidence_threshold confidence threshold to reject results with low scores Float value in [0..1] range 0.1
output_max_detections maximum number of objects to report for detection models Integer value 20
output_max_detections_per_class maximum number of objects to report for each class for detection models Integer value 100
output_max_classes_per_detection maximum number of classes to report for detection models Integer value 30
output_nms_threshold rejection threshold for non-max suppression Float value in [0..1] range 0.6
output_pose_threshold rejection threshold for pose detection models Float value in [0..1] range 0.8
output_postprocess_type inference result post-processing type. You may set it to 'None' to bypass post-processing. String Model-dependent
output_top_k Number of classes with biggest scores to report for classification models. If 0, report all classes above confidence threshold Integer value 0
output_use_regular_nms use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models Boolean value False
Display Parameters
overlay_alpha transparency value (alpha-blend weight) for all overlay details Float value in [0..1] range or "auto"; 1 means no transparency; "auto" to select optimal transparency for the current model "auto"
overlay_color color for drawing all overlay details 3-element tuple of RGB color or list of 3-element tuples of RGB color (255,255,0)
overlay_font_scale font scaling factor for overlay text Positive float value 1.0
overlay_line_width line width in pixels for overlay lines Positive integer value 3
overlay_show_labels flag to enable drawing class labels of detected objects Boolean value True
overlay_show_probabilities flag to enable drawing probabilities of detected objects Boolean value False
Control and Information Parameters
supported_device_types list of supported device types in format <runtime>/<device> for this model (read-only) List of strings N/A
device_type device type to be used for AI inference of this model in a format <runtime>/<device> String N/A
devices_available list of inference device indexes which can be used for model inference (read-only) List of integer values N/A
devices_selected list of inference device indexes selected for model inference List of integer values Equal to devices_available
label_dictionary model class label dictionary (read-only) Dictionary N/A
measure_time flag to enable measuring and collecting inference time statistics Boolean value False
model_info model information object to provide read-only access to model parameters (read-only) ModelParams object N/A
non_blocking_batch_predict flag to control the blocking behavior of predict_batch() method Boolean value False
eager_batch_size The size of the batch to be used by device scheduler when inferencing this model. The batch is the number of consecutive frames before this model is switched to another model during batch predict. Integer value in [1..80] range 8
frame_queue_depth The depth of the model prediction queue. When the queue size reaches this value, the next prediction call will block until there will be space in the queue. Integer value in [1..160] range 80 for cloud inference, 8 for other cases

Note: For segmentation models default value of overlay_color is the list of unique colors (RGB tuples). The size of the list is equal to the number of model classes. Use label_dictionary property to get a list of models classes.

Model Info

AI models have a lot of static attributes defining various model features and characteristics. Unlike model properties, these attributes in most cases cannot be changed: they come with the model.

To access all model attributes, you may query read-only model property degirum.model.Model.model_info.

Note: New deep copy of model info class is created each time you read this property, so any changes made to this copy will not affect model behavior.

Model attributes are divided into the following categories:

  • Device-related attributes
  • Pre-processing-related attributes
  • Inference-related attributes
  • Post-processing-related attributes

The following table provides a complete summary of model attributes arranged by categories. The Attribute Name column contains the name of the ModelParams class member returned by the model_info property.

Note: Each attribute in the Pre-Processing-Related Attributes group is a list of values, one per model input.

Attribute Name Description Possible Values
Device-Related Attributes
DeviceType Device type to be used for AI inference of this model "ORCA": DeGirum Orca,
"EDGETPU": Google EdgeTPU,
"GPU": host GPU,
"CPU": host CPU,
"NPU": Intel NPU,
"DLA": Nvidia DLA,
"RK3588": Rockchip NPU 3588,
"RK3568": Rockchip NPU 3568,
"RK3566": Rockchip NPU 3566,
"NXP_VX": NXP VX,
"NXP_ETHOSU": NXP Ethos-U,
"ARMNN": ArmNN
RuntimeAgent Type of runtime to be used for AI inference of this model "N2X": DeGirum NNExpress runtime,
"TFLITE": Google TFLite runtime,
"OPENVINO": Intel OpenVINO runtime,
"ONNX": Microsoft ONNX runtime,
"TENSORRT": Nvidia TensorRT runtime,
"RKNN": Rockchip RKNN Runtime
SupportedDeviceTypes Comma-separated list of runtime agent/device type combinations, supported by the model Example: "OPENVINO/CPU,ONNX/CPU"
EagerBatchSize The size of the batch to be used by device scheduler when inferencing this model. The batch is the number of consecutive frames before this model is switched to another model during batch predict. Integer number
Pre-Processing-Related Attributes
InputType Model input type List of the following strings:
"Image": image input type,
"Audio": audio input type,
"Tensor": raw tensor input type
InputShape Input tensor shape
Applicable to "Tensor" input type
List of integers
InputN Input batch dimension size 1
Other sizes to be supported
InputH Input height dimension size Integer number
InputW Input width dimension size Integer number
InputC Input color dimension size Integer number
InputQuantEn Enable input frame quantization flag (set for quantized models) Boolean value
InputRawDataType Data element type for audio or tensor inputs List of the following strings:
"DG_UINT8": 8-bit unsigned integer,
"DG_INT16": 16-bit signed integer,
"DG_FLT": 32-bit floating point
InputTensorLayout Input tensor shape and layout List of the following strings:
"auto": deduce tensor layout automatically
"NHWC": 4-D tensor frame-height-width-color
"NCHW": 4-D tensor frame-color-height-width
InputColorSpace Input image colorspace (sequence of colors in C dimension) List of the following strings:
"RGB", "BGR"
InputScaleEn Enable global scaling of input data flag List of boolean values
InputScaleCoeff Scaling factor for input data global scaling; applied when InputScaleEn is enabled List of float values
InputNormMean Mean value for per-channel input data normalization; applied when both InputNormMean and InputNormStd are not empty List of 3-element arrays of float values
InputNormStd StDev value for per-channel input data normalization; applied when both InputNormMean and InputNormStd are not empty List of 3-element arrays of float values
InputQuantOffset Quantization offset for input image quantization List of float values
InputQuantScale Quantization scale for input image quantization List of float values
InputWaveformSize Input waveform size in samples for audio input types List of positive integer values
InputSamplingRate Input waveform sampling rate in Hz for audio input types List of positive float values
InputResizeMethod Interpolation algorithm used for image resizing during model training List of the following strings:
"nearest", "bilinear", "area", "bicubic", "lanczos"
InputPadMethod How input image was padded when resized during model training List of the following strings:
"stretch", "letterbox"
InputCropPercentage How much input image was cropped during model training Float value in [0..1] range
ImageBackend Graphical package used for image processing during model training List of the following strings:
"pil", "opencv"
Inference-Related Attributes
ModelPath Path to the model JSON file String with filepath
ModelInputN Model frame dimension size 1
Other sizes to be supported
ModelInputH Model height dimension size Integer number
ModelInputW Model width dimension size Integer number
ModelInputC Model color dimension size Integer number
ModelQuantEn Enable input frame quantization flag (set for quantized models) Boolean value
Post-Processing-Related Attributes
OutputNumClasses Number of classes model detects Integer value
OutputSoftmaxEn Enable softmax step in post-processing flag Boolean value
OutputClassIDAdjustment Class ID adjustment: number subtracted from the class ID reported by the model Integer value
OutputPostprocessType Post-processing type See table below
OutputConfThreshold Confidence threshold to reject results with low scores Float value in [0..1] range
OutputNMSThreshold Rejection threshold for non-max suppression Float value in [0..1] range
OutputTopK Number of classes with biggest scores to report for classification models Integer number
MaxDetections Maximum number of objects to report for detection models Integer number
MaxDetectionsPerClass Maximum number of objects to report for each class for detection models Integer number
MaxClassesPerDetection Maximum number of classes to report for detection models Integer number
UseRegularNMS Use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models Boolean value

The following table provides a list of supported post-processing types, their descriptions, and JSON result format reference. Please refer to degirum.postprocessor.InferenceResults.results for detailed description of JSON result format of each post-processor type as mentioned in the last column.

Label Description Applicable Result JSON Format
"Classification" Classification post-processor For classification models
"MultiLabelClassification" Multi-classifier classification post-processor For multi-label classification models
"Detection" MobilenetV2-style object detection post-processor For object detection models
"DetectionYolo" YOLOV5-style object detection post-processor For object detection models
"DetectionYoloV8" YOLOV8-style object detection post-processor For object detection models
"DetectionYoloV10" YOLOV10-style object detection post-processor For object detection models
"DetectionYoloPlates" YOLOV5-style license plate detection post-processor For classification models
"PoseDetection" Pose detection post-processor For object detection models (with landmarks)
"FaceDetect" Face detection post-processor For object detection models (with landmarks)
"HandDetection" Hand palm detection post-processor For hand palm detection models
"Segmentation" Semantic segmentation post-processor For segmentation models

Inference Advanced Topics

Selecting Device Types for Inference

Every AI model in a model zoo is designed to work with a particular AI inference runtime (such as DeGirum N2X, Intel OpenVINO, Google TFLite etc.) and on a particular AI inference device, either on AI accelerator hardware or on host computer CPU.

The runtime and device type to be used for a model inference is defined by the combination of RuntimeAgent and DeviceType model attributes of ModelParams class as returned by degirum.model.Model.model_info property. You may query those attributes or you may query degirum.model.Model.device_type property to get the runtime/device type pair in a form of "/" (second way is more convenient).

Some models may support multiple runtime/device combinations - such models are called multi-device models. For such models the list of supported device types can be obtained by querying SupportedDeviceTypes read-only model attribute of ModelParams class. This attribute is a string containing comma-separated list of runtime/device type combinations supported by the model. For example, the string "OPENVINO/CPU,ONNX/CPU" means that the model can be run on both Intel OpenVINO and Microsoft ONNX runtimes using CPU as a hardware device. The SupportedDeviceTypes attribute is defined only for multi-device models.

When you load a model from a model zoo you also implicitly specify the AI inference engine to be used for this model inference. It can be cloud platform, AI server, or local PySDK installation. That inference engine has its own capabilities in terms of supported runtimes and devices, which may be different from the model capabilities. In order to obtain the list of runtime/device combinations supported by both the model and the inference engine, the model is loaded for, you can query degirum.model.Model.supported_device_types read-only property. This property is a list which contains the intersection of two sets: the set of runtime/device combinations supported by the model itself, and the set of runtime/device combinations supported by the inference engine, the model is loaded for. Each element of this list is the runtime/device pair in a form "/".

If you wish to obtain the list of runtime/device combinations supported by the inference engine alone, you may call degirum.get_supported_devices function, which accepts the inference engine designator as a first argument. It also returns the list of supported device type strings in a form "/".

For multi-device models you may change the desired runtime/device combination to be used for the inference at any time. You do it by assigning the desired combination to the degirum.model.Model.device_type property. You may assign only combinations which occur in the degirum.model.Model.supported_device_types list.

It is possible to specify the list of device types for the inference. In this case the first supported deice type from that list will be set. This simplifies inference device assignment for multi-device models on a variety of systems with different sets of inference devices.

For example, you have a model, which supports all devices of OpenVINO runtime (NPU, GPU, and CPU) and you want to run this model on NPU, when it is available, otherwise on GPU, when it is available, and fallback to CPU if neither NPU, nor GPU are available. In this case you may do the following assignment:

model.device_type = ["OPENVINO/NPU", "OPENVINO/GPU", "OPENVINO/CPU"]

Reading device_type property back after list assignment will give you the actual device type assigned for the inference.

Selecting Particular Device Instances for Inference

Every AI model in a model zoo is designed to work on a particular hardware, either on AI accelerator hardware such as DeGirum Orca, or on host computer CPU. Imagine the situation when the host computer is equipped with multiple hardware devices of a given type, and you run multiple inferences of a model designed for this device type. In this case by default all available hardware devices of this type will be used for this model inferences. This guarantees top inference performance in the case of single model running on all available devices.

In certain cases you may want to limit the model inference to particular subset of available devices. For example, you have two devices and you want to run concurrent inference of two models. In default case both devices would be used for both model inferences causing the models to be reloaded to devices each time you run the inference of another model. Even if the model loading for DeGirum Orca devices is extremely fast, it still may cause performance degradation. In this case you may want to run the first model inference only on the first device, and the second model inference only on the second device. To do so you need to assign degirum.model.Model.devices_selected property of each model object to contain the list of device indexes you want your model to run on. In our example you need to assign the list [0] to the devices_selected property of the first model object, and the list [1] to the second model object.

To get the information about available devices, you query degirum.model.Model.devices_available property. It returns the list of device indexes of all available devices of the type this model is designed for. Those indexes are zero-based, so if your host computer has a single device of a given type, the returned list would contain single zero element: [0]. In case of two devices it will be [0, 1] and so on.

Note: since the inference device assignment for cloud inferences is performed dynamically, and actual AI farm node configuration is unknown until the inference starts, the devices_available property for such use case always returns the full list of available devices.

In general, the list you assign to the devices_selected property should contain only indexes occurred in the list returned by the devices_available property.

Handling Multiple Streams of Frames

The Model class interface has a method, degirum.model.Model.predict_batch, which can run multiple predictions on a sequence of frames. In order to deliver the sequence of frames to the predict_batch you implement an iterable object, which returns your frames one-by-one. One example of iterable object is a regular Python list, another example is a function, which yields frame data using yield statement. Then you pass such iterable object as an argument to the predict_batch method. In turn, the predict_batch method returns a generator object, which yields prediction results using yield statement.

All the inference magic with pipelining sequential inferences, asynchronously retrieving inference results, supporting various inference devices, and AI server vs. local operation modes happens inside the implementation of predict_batch method. All you need to do is to wrap your sequence of frame data in an iterable object, pass this object to predict_batch, and iterate over the generator object returned by predict_batch using either for-loop or by repeatedly calling next() built-in function on this generator object.

The following example runs the inference on an infinite sequence of frames captured from the camera:

import cv2 # OpenCV
stream = cv2.VideoCapture(0) # open video stream from local camera #0

def source(): # define iterator function, which returns frames from camera
   while True:
      ret, frame = stream.read()
      yield frame

for result in model.predict_batch(source()): # iterate over inference results
   cv2.imshow("AI camera", res.image_overlay) # process result

But what if you need to run multiple concurrent inferences of multiple asynchronous data streams with different frame rates? The simple approach when you combine two generators in one loop either using zip() built-in function or by manually calling next() built-in function for every generator in a loop body will not work effectively.

Non-working example 1. Using zip() built-in function:

batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
for result1, result2 in zip(batch1, batch2)
   # process result1 and result2

Non-working example 2. Using next() built-in function:

batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
while True:
   result1 = next(batch1)
   result2 = next(batch2)
   # process result1 and result2

The reason is that the Python runtime has Global Interpreter Lock (GIL), which allows running only one thread at a time blocking the execution of other threads. So if the currently running thread is itself blocked by waiting for the next frame or waiting for the next inference result, all other threads are blocked as well.

For example, if the frame rate of source1() is slower than the frame rate of source2() and assuming that the model inference frame rates are higher than the corresponding source frame rates, then the code above will spend most of the time waiting for the next frame from source1(), not letting frames from source2() to be retrieved, so the model2 will not get enough frames and will idle, losing performance.

Another example is when the inference latency of model1 is higher than the inference queue depth expressed in time (this is the product of the inference queue depth expressed in frames and the single frame inference time). In this case when the model1 inference queue is full, but inference result is not ready yet, the code above will block on waiting for that inference result inside next(batch1) preventing any operations with model2.

To get around such blocks the special non-blocking mode of batch predict operation is implemented. You turn on this mode by assigning True to degirum.model.Model.non_blocking_batch_predict property.

When non-blocking mode is enabled, the generator object returned by predict_batch() method accepts None from the input iterable object. This allows you to design non-blocking frame data source iterators: when no data is available, such iterator just yields None without waiting for the next frame. If None is returned from the input iterator, the model predict step is simply skipped for this iteration.

Also in non-blocking mode when no inference results are available in the result queue at some iteration, the generator yields None result. This allows to continue execution of the code which operates with another model.

In order to operate in non-blocking mode you need to modify your code the following way:

  1. Modify frame data source iterator to return None if no frame is available yet, instead of waiting for the next frame.
  2. Modify inference loop body to deal with None results by simply skipping them.

Measure Inference Timing

The degirum.model.Model class has a facility to measure and collect model inference time information. To enable inference time collection assign True to degirum.model.Model.measure_time property.

When inference timing collection is enabled, the durations of individual steps for each frame prediction are accumulated in internal statistic accumulators.

To reset time statistic accumulators you use degirum.model.Model.reset_time_stats method.

To retrieve time statistic accumulators you use degirum.model.Model.time_stats method. This method returns a dictionary with time statistic objects. Each time statistic object accumulates time statistics for particular inference step over all frame predictions happened since the timing collection was enabled or reset. The statistics includes minimum, maximum, average, and count. Inference steps correspond to dictionary keys. The following dictionary keys are supported:

Key Description
FrameTotalDuration_ms Frame total inference duration from the moment when you invoke predict method to the moment when inference results are returned
PythonPreprocessDuration_ms Duration of client-side pre-processing step including data loading time and data conversion time
CorePreprocessDuration_ms Duration of server-side pre-processing step
CoreInferenceDuration_ms Duration of server-side AI inference step
CoreLoadResultDuration_ms Duration of server-side data movement step
CorePostprocessDuration_ms Duration of server-side post-processing step
CoreInputFrameSize_bytes The size of received input frame

For DeGirum Orca AI accelerator hardware additional dictionary keys are supported:

Key Description
DeviceInferenceDuration_ms Duration of AI inference computations on AI accelerator IC excluding data transfers
DeviceTemperature_C Internal temperature of AI accelerator IC in C
DeviceFrequency_MHz Working frequency of AI accelerator IC in MHz

The time statistics object supports pretty-printing so you can directly print it using regular print() statement. For example, the output of the following statement:

print(model.time_stats()["PythonPreprocessDuration_ms"])

... will look like this:

PythonPreprocessDuration_ms   ,    6.00,    8.27,   10.82,     25

It consists of the inference step name (PythonPreprocessDuration_ms in this case) followed by four statistic values presented in a format minimum, average, maximum, count.

You may print the whole table of statistics using the following code:

print(model.time_stats())

The output will look like this:

Statistic                     ,     Min,     Avg,     Max,    Cnt
PythonPreprocessDuration_ms   ,    7.52,   10.09,   17.07,     25
CoreInferenceDuration_ms      ,   12.81,   13.86,   19.06,     25
CoreLoadResultDuration_ms     ,    0.20,    0.27,    0.58,     25
CorePostprocessDuration_ms    ,    0.97,    1.09,    1.74,     25
CorePreprocessDuration_ms     ,   10.91,   12.08,   18.07,     25
DeviceInferenceDuration_ms    ,    6.32,    6.36,    6.39,     25
FrameTotalDuration_ms         ,   40.70,   54.68,  234.25,     25

Note: In batch prediction mode many inference phases are pipelined so the pre- and post-processing steps of one frame may be executed in parallel with the AI inference step of another frame. Therefore actual frame rate may be higher than the frame rate calculated by FrameTotalDuration_ms statistic.

Note: PythonPreprocessDuration_ms statistic includes data loading time and data conversion time. This can give very different results for different ways of loading input frame data. For example, if you provide image URLs for inference, then the PythonPreprocessDuration_ms will include image downloading time, which can be much higher compared with the case when you provide the image as numpy array, which does not require any downloading.

The following example shows how to use time statistics collection interface. It assumes that the model variable is the model created by load_model().

model.measure_time = True # enable accumulation of time statistics

# perform batch prediction
for result in model.predict_batch(source()):
   # process result
   pass

stats = model.time_stats() # query time statistics dictionary

# pretty-print frame total inference duration statistics
print(stats["FrameTotalDuration_ms"])

# print average duration of AI inference step
print(stats["CoreInferenceDuration_ms"].avg)

model.reset_time_stats() # reset time statistics accumulators

# perform one more batch prediction
for result in model.predict_batch(source()):
   # process result
   pass

# print statistics of Python pre-processing step
print(stats["PythonPreprocessDuration_ms"].max)

Custom Post Processors for Inference Results

When you want to work with some new AI model and PySDK does not yet provide post-processor class to interpret model results, then you may want to implement that post-processing code yourself.

Such code typically takes the AI model output tensor data and interprets that raw tensor data to produce some meaningful results like bounding boxes, probabilities, etc. Then it renders these results on a top of original image to produce so-called image overlay.

PySDK provides a way to seamlessly integrate such custom post-processing code so it will behave exactly like built-in post-processors. To do so, you need to complete the following two steps:

  1. Implement your own custom post-processor class.
  2. Instruct AI model object to use your custom post-processing class instead of built-in post-processor.

To better understand, how post-processing is organized in PySDK, we need to disclose some implementation details. First of all, the built-in post-processing is actually split into two parts, two distinct pieces of code, which are executed in different places:

  1. Low-level post-processor, which performs raw tensor data conversion into a JSON array. The format of this array is model-specific, and all supported formats are described here. This post-processor is invoked on the AI server side.
  2. PySDK-level post-processor, which performs AI result rendering on a top of the original image, generating image overlay. This post-processor is invoked on the client side.

Your custom post-processor class should actually implement both parts: raw tensor conversion and image overlay generation. It must inherit degirum.postprocessor.InferenceResults base class or any class derived from degirum.postprocessor.InferenceResults. If you want to reuse some image overlay generation functionality of built-in post-processor classes, you may inherit one of them.

The following is the list of PySDK-level post-processor classes, which you may inherit in order to reuse AI result rendering code, or, another words, the implementation of image_overlay method.

Post-Processor Class Description
degirum.postprocessor.ClassificationResults Post-processor, which renders classification results
degirum.postprocessor.DetectionResults Post-processor, which renders object detection results, including pose detection and face detection
degirum.postprocessor.Hand_DetectionResults Post-processor, which renders hand detection results
degirum.postprocessor.SegmentationResults Post-processor, which renders segmentation results

Unfortunately, the code, which transforms raw tensor data (the low-level post-processor), you need to develop yourself, since the format of the raw tensors for new models is usually unique.

Your custom post-processor class may override the following methods of degirum.postprocessor.InferenceResults base class:

Method to Override Description
init() Constructor.
str() Conversion of inference results to string.
image_overlay Rendering image overlay.

Typically, you do not need to override any other methods/properties of the base class: their default implementations are generic enough.

The code of transforming raw tensor data into human-friendly results you need to implement in the constructor. The typical implementation of such constructor is the following:

import degirum as sg

class MyResultProcessor(dg.postprocessor.InferenceResults):

   def __init__(self, *args, **kwargs):
      super().__init__(*args, **kwargs) # call base class constructor first

      # at this point self._inference_results contains the list or raw output tensors

      new_results = [] # you define empty list for new human-friendly results

      # you iterate over tensors
      for tensor in self._inference_results:
         # and convert them into a list of dictionaries, where each dictionary represents one detected entity
         detected_entity = {}
         new_results.append(detected_entity) # append detected entity to the list of human-friendly results

      # finally you replace self._inference_results with new list of human-friendly results
      self._inference_results = new_results

Basically, you need to process all raw tensors contained in the self._inference_results list and convert them into a list of human-friendly results. Then you substitute self._inference_results with this new list of human-friendly results. The element of such list should be a dictionary. The format of this dictionary you may chose as you see fit. But if you chose one of existing formats, then you may reuse one of existing image overlay generation implementations. In this case you inherit your custom post-processor class from the PySDK post-processor class, which result format you decided to reuse, and do not override image_overlay property.

Each raw tensor in the self._inference_results list is represented by a dictionary. The format of the raw tensor dictionary is the following:

Key Name Description Data Type
"id" Tensor numeric ID as specified in the model integer
"name" Tensor name as specified in the model string
"shape" Tensor shape: sizes of each dimension integer list
"quantization" Tensor quantization parameters dictionary, see below
"type" Tensor element type string, see below
"data" Tensor data buffer contents multi-dimensional numpy array

The following tensor data types are supported:

Type String Type Description
"DG_FLT" 32-bit floating point
"DG_UINT8" 8-bit unsigned integer

The following is the structure of quantization dictionary:

Json Field Name Description Data Type
"axis" Quantization axis or -1 for global quantization integer
"scale" Quantization scale array floating point list
"zero" Quantization zero offset integer list

Another typical task you need to perform in your custom post-processor class constructor is to convert coordinates of detected entities from AI model input image coordinates to original image coordinates. This can be done by invoking a function stored in the self._conversion property. You pass a tuple of (x,y) coordinates in respect to the AI model input image, and it returns a tuple of (x,y) coordinates in respect to the original image.

If you decided to define completely new format of human-friendly results, then you will need to implement image_overlay property.

The typical implementation of image_overlay property is the following:

import degirum as sg

class MyResultProcessor(dg.postprocessor.InferenceResults):

    @property
    def image_overlay(self):
      # create drawing object to avoid using graphical backends directly
      draw = create_draw_primitives(self._input_image, self._alpha, self._font_scale)

      # create a set of colors to be used for drawing different classes of objects
      current_color_set = itertools.cycle(
         self._overlay_color
         if isinstance(self._overlay_color, list)
         else [self._overlay_color]
      )

      # iterate over all inference results created in your constructor
      for res in self._inference_results:
         # and draw AI annotations over the original image, which is stored inside `draw` object
         # use draw.draw_text() to print a text string
         # use draw.draw_circle() to draw a circle
         # use draw.draw_line() to draw a line
         # use draw.draw_box() to draw a rectangle
         # use next(current_color_set) to obtain the color for the next object class
         # check `self._show_labels` to draw or not to draw text labels
         # check `self._show_probabilities` to draw or not to draw probabilities

      # return image overlay
      return draw.image_overlay()

Basically, you iterate over all detected entities and draw them on the original image. To harmonize the behavior of your code with PySDK standards you may follow the following practices:

  1. Use create_draw_primitives() method to create a drawing object, which will handle all drawing tasks using proper graphical backend, as selected for the model.
  2. Check self._show_labels property to draw or not to draw text labels.
  3. Check self._show_probabilities property to draw or not to draw probabilities.
  4. Use draw object methods like draw.draw_text(), draw.draw_circle(), draw.draw_line(), draw.draw_box() to draw various geometric shapes.
  5. Use self._overlay_color property to obtain colors for your classes of objects. The example above demonstrates, how to do it.
  6. Return draw.image_overlay() at the end.

When your new custom post-processor class is ready, you need to instruct AI model object to use your custom post-processing class instead of built-in post-processor. Do it in two steps:

  1. Assign "None" to degirum.model.Model._model_parameters.OutputPostprocessType property to disable any low-level server-side post-processing.
  2. Assign your custom class to degirum.model.Model.custom_postprocessor property to attach your custom post-processor.
model = zoo.load_model(model_name) # load model
model._model_parameters.OutputPostprocessType = "None"
model.custom_postprocessor = MyResultProcessor

You need to do these steps before the very first inference. From now on each model inference result returned by model prediction methods called from that model object will be of MyResultProcessor type.