Running AI Model Inference

Once you loaded an AI model and obtained model handling object, you can start doing AI inferences on your model. The following methods of degirum.model.Model class are available to perform AI inferences:

degirum.model.Model.predict and degirum.model.Model.__call__ to run prediction of a single data frame
degirum.model.Model.predict_batch to run prediction of a batch of frames
degirum.model.Model.predict_dir to run prediction of multiple files in a directory

The predict() and __call__ methods behave exactly the same way (actually, __call__ just calls predict()). They accept single argument - input data frame, perform AI inference of that data frame, and return inference result - an object derived from degirum.postprocessor.InferenceResults superclass.

The batch prediction methods, predict_batch() and predict_dir(), perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling predict() method in a loop. These methods are described in details in Batch Inferences section.

Cloud Server Connection Issues

For cloud inferences the connection to the cloud server is performed at the beginning of each predict-style method call, and disconnection is performed at the end of that call. This greatly reduces performance since the cloud server connection/disconnection is relatively long activities. To overcome this problem you may use the model object inside the with block. When predict-style method is called inside the with block, the disconnection is not performed at the end of such call, so consecutive predict call does not perform reconnection as well, thus saving execution time.

The following code demonstrates the approach:

   # here model_name variable stores the model name
   # and data_list variable stores the list of input data frames to process

   # wrap the load_model() call in with block to avoid disconnections
   with zoo.load_model(model_name) as model:
      # perform prediction loop
      for data in data_list:
         result = model.predict(data)

Input Data Handling

PySDK model prediction methods support different types of input data. An exact input type depends on the model to be used. The following input data types are supported:

image input data
audio input data
raw tensor input data

The input data object you supply to model predict methods also depends on the number of inputs the model has. If the model has single data input, then the data objects you pass to model predict methods are single objects. IN rare cases the model may have multiple data inputs; in this case the data objects you pass to model predict methods are lists of objects: one object per corresponding input.

The number and the type of inputs of the model are described by the InputType property of the ModelParams class returned by degirum.model.Model.model_info property (see Model Info section for details about model info properties). The InputType property returns the list of input data types, one type per model input. So the number of model inputs can be deduced by evaluating the length of the list returned by the InputType property.

The following sections describe details of input data handling for various model input types.

Image Input Data Handling

When dealing with model inputs of image type (InputType is equal to "Image"), the PySDK model prediction methods accept a wide variety of input data frame types:

the input frame can be the name of a file with frame data;
it can be the HTTP URL pointing to a file with frame data;
it can be a numpy array with frame data;
it can be a PIL Image object;
it can by bytes object containing raw frame data.

An AI model requires particular input tensor dimensions and data type which, in most of the cases, does not match the dimensions of the input frame. In this case, PySDK performs automatic conversion of the input frame to the format compatible with AI model input tensor, performing all the necessary conversions such as resizing, padding, colorspace conversion, and data type conversion.

PySDK performs input frame transformations using one of the two graphical packages (called backends): PIL or OpenCV. The backend is selected by degirum.model.Model.image_backend property. By default it is set to auto, meaning that OpenCV backend will be used first, and if it is not installed, then PIL backend will be used. You may explicitly select which backend to use by assigning either "pil" or "opencv" to degirum.model.Model.image_backend property.

Note: In case of OpenCV backend, you cannot pass PIL Image objects to model predict methods.

If your input frame is in the file on a local filesystem, or is accessible through HTTP protocol, pass the filename string or URL string directly to model predict methods: PySDK will (down-)load the file, decode it, and convert to the model input tensor format. The set of supported graphical file formats is defined solely by the graphical backend library you selected, PIL or OpenCV - PySDK does not perform any own decoding.

Sometimes, image conversion to AI model input tensor format requires image resizing. This resizing can be done in two possible ways:

preserving the aspect ratio;
not preserving the aspect ratio.

In addition, the image can be cropped or not cropped.

You can control the way of image resizing by degirum.model.Model.input_pad_method property, which can have one of the following values: "stretch", "letterbox", "crop-first", and "crop-last".

When you select the "stretch" method, the input image is resized exactly to the AI model input tensor dimensions, possibly changing the aspect ratio.

When you select the "letterbox" method (default way), the image is resized to fit the AI model input tensor dimensions while preserving the aspect ratio. The voids which can appear on the image sides are filled with the color specified by degirum.model.Model.input_letterbox_fill_color property (black by default).

When you select the "crop-first" method, the image is first cropped to match the AI model input tensor aspect ratio with respect to the degirum.model.Model.input_crop_percentage and then resized to match the AI model input tensor dimensions. For example: an input image with dimensions 640x480 going into a model with input tensor dimensions 224x224 with crop percentage of 0.875 will first be center cropped to 420x420 (420 = min(640, 480) * 0.875) and then resized to 224x224.

When you select "crop-last" method, if the AI model input tensor dimensions are equal (square), the image is resized with its smaller side equal to the model dimension with respect to degirum.model.Model.input_crop_percentage. If the AI model input tensor dimensions are not equal (rectangle), the image is resized with stretching to the input tensor dimensions with respect to degirum.model.Model.input_crop_percentage. The image is then cropped to fit the AI model input tensor dimensions and aspect ratio. For example: an input image with dimensions 640x480 going into a model with input tensor dimensions of 224x224 with crop percentage of 0.875 will first be resized to 341x256 (256 = 224 / 0.875 and 341 = 256 * 640 / 480) and then center cropped to 224x224. Alternatively an input image with dimensions 640x480 and a model with input tensor dimensions 280x224 will be resized to 320x256 (320 = 280 / 0.875 and 256 = 224 / 0.875) and then center cropped to 280x224.

You can specify the resize algorithm in the degirum.model.Model.input_resize_method property, which may have the following values: "nearest", "bilinear", "area", "bicubic", or "lanczos". These values specify various interpolation algorithms used for resizing.

In case your input frames are stored in numpy arrays, you may need to tell PySDK the order of colors in those numpy arrays: RGB or BGR. This order is called the colorspace. By default, PySDK treats numpy arrays as having BGR colorspace for OpenCV graphical backend and RGB colorspace for PIL graphical backend. So if your numpy arrays as such, then no additional action is needed from your side. But if your numpy arrays have color order opposite to default, then you need to change degirum.model.Model.input_numpy_colorspace property.

Note: If a model has multiple image inputs, the PySDK applies the same input_*** image properties as discussed above for every image input of a model.

Audio Input Data Handling

When dealing with model inputs of audio type (InputType is equal to "Audio"), PySDK does not perform any conversions of the input data: it expects numpy 1-D array with audio waveform samples of proper size and with proper sampling rate. The waveform size should be equal to InputWaveformSize model info property. The waveform sampling rate should be equal to InputSamplingRate model info property. And finally the data element type should be equal to the data type specified by the InputRawDataType model info property. All aforementioned model info properties are the properties of the ModelParams class returned by degirum.model.Model.model_info property (see Model Info section for details).

Tensor Input Data Handling

When dealing with model inputs of raw tensor type (InputType is equal to "Tensor"), PySDK expects that you provide a 4-D numpy array of proper dimensions. The dimensions of that array should match model input dimensions as specified by the following model info properties:

InputN for dimension 0,
InputH for dimension 1,
InputW for dimension 2,
InputC for dimension 3.

The data element type should be equal to the data type specified by the InputRawDataType model info property (see Model Info section for details).

Inference Results

All model predict methods return result objects derived from degirum.postprocessor.InferenceResults class. These result classes are called post-processors. Particular post-processor class types depend on the AI model type: classification, object detection, pose detection, segmentation etc. But from the user point of view, they deliver identical functionality.

Result object contains the following data:

degirum.postprocessor.InferenceResults.image property keeps original image;
degirum.postprocessor.InferenceResults.image_overlay property keeps original image with inference results drawn on a top; the type of such drawing is model-dependent:
for classification models, the list of class labels with probabilities is printed below the original image;
for object detection models, bounding boxes of detected object are drawn on the original image;
for hand and pose detection models, detected keypoints and keypoint connections are drawn on the original image;
for segmentation models, detected segments are drawn on the original image;
degirum.postprocessor.InferenceResults.results property keeps a list of numeric results (follow the property link for detailed explanation of all result formats);
degirum.postprocessor.InferenceResults.image_model property keeps the binary array with image data converted to AI model input specifications. This property is assigned only if you set degirum.model.Model.save_model_image model property before performing predictions.

The results property is what you typically use for programmatic access to inference results. The type of results is always a list of dictionaries, but the format of those dictionaries is model-dependent. Also, if the result contains coordinates of objects, all such coordinates are recalculated from the model coordinates back to coordinates on the original image, so you can use them directly.

The image_overlay property is very handy for debugging and troubleshooting. It allows you to quickly assess the correctness of the inference results in graphical form.

There are result properties which affect how the overlay image is drawn:

degirum.postprocessor.InferenceResults.overlay_alpha: transparency value (alpha-blend weight) for all overlay details;
degirum.postprocessor.InferenceResults.overlay_font_scale: font scaling factor for overlay text;
degirum.postprocessor.InferenceResults.overlay_line_width: line width in pixels for overlay lines;
degirum.postprocessor.InferenceResults.overlay_color: RGB color tuple or list of RGB color tuples for drawing all overlay details;
degirum.postprocessor.InferenceResults.overlay_show_labels: flag to enable drawing class labels of detected objects;
degirum.postprocessor.InferenceResults.overlay_show_probabilities: flag to enable drawing probabilities of detected objects;
degirum.postprocessor.InferenceResults.overlay_fill_color: RGB color tuple for filling voids which appear due to letterboxing.

When each individual result object is created, all these overlay properties (except overlay_fill_color) are assigned with values of similarly named properties taken from the model object (see Model Parameters section for the list of model properties). This allows assigning overlay property values only once and applying them to all consecutive results. But if you want to play with individual result, you may reassign any of overlay properties and then re-read image_overlay property. Each time you read image_overlay, it returns new image object freshly drawn according to the current values of overlay properties.

The overlay_color property is used to define the color to draw overlay details. In the case of a single RGB tuple, the corresponding color is used to draw all the overlay data: points, boxes, labels, segments, etc. In the case of a list of RGB tuples the behavior depends on the model type.

For classification models different colors from the list are used to draw labels of different classes.
For detection models different colors are used to draw labels and boxes of different classes.
For pose detection models different colors are used to draw keypoints of different persons.
For segmentation models different colors are used to highlight segments of different classes.

If the list size is less than the number of classes of the model, then overlay_color values are used cyclically, for example, for three-element list it will be overlay_color[0], then overlay_color[1], overlay_color[2], and again overlay_color[0].

The default value of overlay_color is a single RBG tuple of yellow color for all model types except segmentation models. For segmentation models it is the list of RGB tuples with the list size equal to the number of model classes. You can use Model.label_dictionary property to obtain a list of model classes. Each color is automatically assigned to look pretty and different from other colors in the list.

Note" overlay_fill_color is assigned with degirum.model.Model.input_letterbox_fill_color.

In some cases the set of existing PySDK post-processor classes is not enough. This happens, for example, when you want to work with new AI model, and PySDK does not support this AI model yet. In this case you may implement your own custom post-processor and use it instead of standard PySDK post-processors. The Custom Post Processors for Inference Results section describes this in details.

Results Filtering

By default, all results are reported by the model predict methods. However, you may want to include only results which belong to certain categories: either have certain class labels or category IDs. To achieve that, you can specify a set of class labels (or, alternatively, category IDs) so only inference results, which class labels (or category IDs) are found in that set, are reported, and all other results are discarded. You assign such a set to degirum.model.Model.output_class_set property.

For example, you may want to include only results with class labels "car" and "truck":

# allow only results with "car" and "truck" class labels
model.output_class_set = {"car", "truck"}

Or you may want to include only results with category IDs 1 and 3:

# allow only results with 1 and 3 category IDs
model.output_class_set = {1, 3}

This category filtering is applicable only to models which have "label" (or "category_id") keys in their result dictionaries. For all other models this category filter will be ignored.

Batch Inferences

If you need to process multiple frames using the same model and the same settings, the most effective way to do it is to use batch prediction methods of `degirum.model.Model' class:

degirum.model.Model.predict_batch method to run predictions on a list of frames;
degirum.model.Model.predict_dir method to run predictions on files in a directory.

Both methods perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling predict() method in a loop.

Both methods return the generator object, so you can iterate over inference results. This allows you to directly use the result of batch prediction methods in for-loops, for example:

for result in model.predict_batch(['image1.jpg','image2.jpg']):
   print(result)

Note: Since batch prediction methods return generator object, simple assignment of batch prediction method result to some variable does not start any inference. Only iterating over that generator object does.

The predict_batch method accepts single parameter: an iterator object, for example, a list. You populate your iterator object with the same type of data you pass to regular predict(), i.e. input image path strings, input image URL string, numpy arrays, or PIL Image objects (in case of PIL image backend).

The predict_dir method accepts a filepath to a directory containing graphical files for inference. You may supply optional extensions parameter passing the list of file extensions to process.

The following minimal example demonstrates how to use batch predict to perform AI inference from the video file:

import degirum as dg
import cv2

# connect to cloud model zoo, load model, set model properties needed for OpenCV
zoo = dg.connect(dg.CLOUD, "https://cs.degirum.com", token="<your cloud API access token>")
model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")
model.image_backend = "opencv"
stream = cv2.VideoCapture("images/Traffic.mp4") # open video file

# define generator function to produce video frames
def frame_source(stream):
   while True:
   ret, frame = stream.read()
      if not ret:
         break # end of file
      yield frame

# run batch predict on stream of frames from video file
for result in model.predict_batch(frame_source(stream)):
   # show annotated frames
   cv2.imshow("Demo", res.image_overlay)   

stream.release()

Model Parameters

The model behavior can be controlled with various Model class properties, which define model parameters. They can be divided into the following categories:

parameters, which control how to handle input frames;
parameters, which control the inference;
parameters, which control how to display inference results;
parameters, which control model run-time behavior and provide access to model information

The following table provides complete summary of Model class properties arranged by categories.

Property Name	Description	Possible Values	Default Value
Input Handling Parameters
`image_backend`	package to be used for image processing	`"auto"`, `"pil"`, or `"opencv"` `"auto"` tries OpenCV first	`"auto"`
`input_letterbox_fill_color`	image fill color in case of 'letterbox' padding	3-element tuple of RGB color	`(0,0,0)`
`input_numpy_colorspace`	colorspace for numpy arrays	`"auto"`, `"RGB"` or `"BGR"`	`"auto"`
`input_pad_method`	how input image will be padded when resized	`"stretch"`, `"letterbox"`, `"crop-first"`, or `"crop-last"`	`"letterbox"`
`input_crop_percentage`	a percentage of input image dimension to retain when `"input_pad_method"` is set to `"crop-first"` or `"crop-last"`	Float value in [0..1] range	`1.0`
`input_resize_method`	interpolation algorithm for image resizing	`"nearest"`, `"bilinear"`, `"area"`, `"bicubic"`, `"lanczos"`	`"bilinear"`
`save_model_image`	flag to enable/disable saving of model input image in inference results	Boolean value	`False`
Inference Parameters
`output_class_set`	List of class labels or category IDs to be included in inference results	Set of strings or set of integers	`{}`
`output_confidence_threshold`	confidence threshold to reject results with low scores	Float value in [0..1] range	`0.1`
`output_max_detections`	maximum number of objects to report for detection models	Integer value	`20`
`output_max_detections_per_class`	maximum number of objects to report for each class for detection models	Integer value	`100`
`output_max_classes_per_detection`	maximum number of classes to report for detection models	Integer value	`30`
`output_nms_threshold`	rejection threshold for non-max suppression	Float value in [0..1] range	`0.6`
`output_pose_threshold`	rejection threshold for pose detection models	Float value in [0..1] range	`0.8`
`output_postprocess_type`	inference result post-processing type. You may set it to `'None'` to bypass post-processing.	String	Model-dependent
`output_top_k`	Number of classes with biggest scores to report for classification models. If `0`, report all classes above confidence threshold	Integer value	`0`
`output_use_regular_nms`	use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models	Boolean value	`False`
Display Parameters
`overlay_alpha`	transparency value (alpha-blend weight) for all overlay details	Float value in [0..1] range or `"auto"`; 1 means no transparency; `"auto"` to select optimal transparency for the current model	`"auto"`
`overlay_color`	color for drawing all overlay details	3-element tuple of RGB color or list of 3-element tuples of RGB color	`(255,255,0)`
`overlay_font_scale`	font scaling factor for overlay text	Positive float value	`1.0`
`overlay_line_width`	line width in pixels for overlay lines	Positive integer value	`3`
`overlay_show_labels`	flag to enable drawing class labels of detected objects	Boolean value	`True`
`overlay_show_probabilities`	flag to enable drawing probabilities of detected objects	Boolean value	`False`
Control and Information Parameters
`supported_device_types`	list of supported device types in format `<runtime>/<device>` for this model (read-only)	List of strings	N/A
`device_type`	device type to be used for AI inference of this model in a format `<runtime>/<device>`	String	N/A
`devices_available`	list of inference device indexes which can be used for model inference (read-only)	List of integer values	N/A
`devices_selected`	list of inference device indexes selected for model inference	List of integer values	Equal to `devices_available`
`label_dictionary`	model class label dictionary (read-only)	Dictionary	N/A
`measure_time`	flag to enable measuring and collecting inference time statistics	Boolean value	`False`
`model_info`	model information object to provide read-only access to model parameters (read-only)	`ModelParams` object	N/A
`non_blocking_batch_predict`	flag to control the blocking behavior of `predict_batch()` method	Boolean value	`False`
`eager_batch_size`	The size of the batch to be used by device scheduler when inferencing this model. The batch is the number of consecutive frames before this model is switched to another model during batch predict.	Integer value in [1..80] range	8
`frame_queue_depth`	The depth of the model prediction queue. When the queue size reaches this value, the next prediction call will block until there will be space in the queue.	Integer value in [1..160] range	80 for cloud inference, 8 for other cases

Note: For segmentation models default value of overlay_color is the list of unique colors (RGB tuples). The size of the list is equal to the number of model classes. Use label_dictionary property to get a list of models classes.

Model Info

AI models have a lot of static attributes defining various model features and characteristics. Unlike model properties, these attributes in most cases cannot be changed: they come with the model.

To access all model attributes, you may query read-only model property degirum.model.Model.model_info.

Note: New deep copy of model info class is created each time you read this property, so any changes made to this copy will not affect model behavior.

Model attributes are divided into the following categories:

Device-related attributes
Pre-processing-related attributes
Inference-related attributes
Post-processing-related attributes

The following table provides a complete summary of model attributes arranged by categories. The Attribute Name column contains the name of the ModelParams class member returned by the model_info property.

Note: Each attribute in the Pre-Processing-Related Attributes group is a list of values, one per model input.

Attribute Name	Description	Possible Values
Device-Related Attributes
`DeviceType`	Device type to be used for AI inference of this model	`"ORCA"`: DeGirum Orca, `"EDGETPU"`: Google EdgeTPU, `"GPU"`: host GPU, `"CPU"`: host CPU, `"NPU"`: Intel NPU, `"DLA"`: Nvidia DLA
`RuntimeAgent`	Type of runtime to be used for AI inference of this model	`"N2X"`: DeGirum NNExpress runtime, `"TFLITE"`: Google TFLite runtime, `"OPENVINO"`: Intel OpenVINO runtime, `"ONNX"`: Microsoft ONNX runtime, `"TENSORRT"`: Nvidia TensorRT runtime
`SupportedDeviceTypes`	Comma-separated list of runtime agent/device type combinations, supported by the model	Example: `"OPENVINO/CPU,ONNX/CPU"`
`EagerBatchSize`	The size of the batch to be used by device scheduler when inferencing this model. The batch is the number of consecutive frames before this model is switched to another model during batch predict.	Integer number
Pre-Processing-Related Attributes
`InputType`	Model input type	List of the following strings: `"Image"`: image input type, `"Audio"`: audio input type, `"Tensor"`: raw tensor input type
`InputN`	Input frame dimension size	`1` Other sizes to be supported
`InputH`	Input height dimension size	Integer number
`InputW`	Input width dimension size	Integer number
`InputC`	Input color dimension size	Integer number
`InputQuantEn`	Enable input frame quantization flag (set for quantized models)	Boolean value
`InputRawDataType`	Data element type for audio or tensor inputs	List of the following strings: `"DG_UINT8"`: 8-bit unsigned integer, `"DG_INT16"`: 16-bit signed integer, `"DG_FLT"`: 32-bit floating point
`InputTensorLayout`	Input tensor shape and layout	List of the following strings: `"auto"`: deduce tensor layout automatically `"NHWC"`: 4-D tensor frame-height-width-color `"NCHW"`: 4-D tensor frame-color-height-width
`InputColorSpace`	Input image colorspace (sequence of colors in C dimension)	List of the following strings: `"RGB"`, `"BGR"`
`InputScaleEn`	Enable global scaling of input data flag	List of boolean values
`InputScaleCoeff`	Scaling factor for input data global scaling; applied when `InputScaleEn` is enabled	List of float values
`InputNormMean`	Mean value for per-channel input data normalization; applied when both `InputNormMean` and `InputNormStd` are not empty	List of 3-element arrays of float values
`InputNormStd`	StDev value for per-channel input data normalization; applied when both `InputNormMean` and `InputNormStd` are not empty	List of 3-element arrays of float values
`InputQuantOffset`	Quantization offset for input image quantization	List of float values
`InputQuantScale`	Quantization scale for input image quantization	List of float values
`InputWaveformSize`	Input waveform size in samples for audio input types	List of positive integer values
`InputSamplingRate`	Input waveform sampling rate in Hz for audio input types	List of positive float values
`InputResizeMethod`	Interpolation algorithm used for image resizing during model training	List of the following strings: `"nearest"`, `"bilinear"`, `"area"`, `"bicubic"`, `"lanczos"`
`InputPadMethod`	How input image was padded when resized during model training	List of the following strings: `"stretch"`, `"letterbox"`
`InputCropPercentage`	How much input image was cropped during model training	Float value in [0..1] range
`ImageBackend`	Graphical package used for image processing during model training	List of the following strings: `"pil"`, `"opencv"`
Inference-Related Attributes
`ModelPath`	Path to the model JSON file	String with filepath
`ModelInputN`	Model frame dimension size	`1` Other sizes to be supported
`ModelInputH`	Model height dimension size	Integer number
`ModelInputW`	Model width dimension size	Integer number
`ModelInputC`	Model color dimension size	Integer number
`ModelQuantEn`	Enable input frame quantization flag (set for quantized models)	Boolean value
Post-Processing-Related Attributes
`OutputNumClasses`	Number of classes model detects	Integer value
`OutputSoftmaxEn`	Enable softmax step in post-processing flag	Boolean value
`OutputClassIDAdjustment`	Class ID adjustment: number subtracted from the class ID reported by the model	Integer value
`OutputPostprocessType`	Post-processing type	See table below
`OutputConfThreshold`	Confidence threshold to reject results with low scores	Float value in [0..1] range
`OutputNMSThreshold`	Rejection threshold for non-max suppression	Float value in [0..1] range
`OutputTopK`	Number of classes with biggest scores to report for classification models	Integer number
`MaxDetections`	Maximum number of objects to report for detection models	Integer number
`MaxDetectionsPerClass`	Maximum number of objects to report for each class for detection models	Integer number
`MaxClassesPerDetection`	Maximum number of classes to report for detection models	Integer number
`UseRegularNMS`	Use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models	Boolean value

The following table provides a list of supported post-processing types, their descriptions, and JSON result format reference. Please refer to degirum.postprocessor.InferenceResults.results for detailed description of JSON result format of each post-processor type as mentioned in the last column.

Label	Description	Applicable Result JSON Format
`"Classification"`	Classification post-processor	For classification models
`"MultiLabelClassification"`	Multi-classifier classification post-processor	For multi-label classification models
`"Detection"`	MobilenetV2-style object detection post-processor	For object detection models
`"DetectionYolo"`	YOLOV5-style object detection post-processor	For object detection models
`"DetectionYoloV8"`	YOLOV8-style object detection post-processor	For object detection models
`"DetectionYoloPlates"`	YOLOV5-style license plate detection post-processor	For classification models
`"PoseDetection"`	Pose detection post-processor	For object detection models (with landmarks)
`"FaceDetect"`	Face detection post-processor	For object detection models (with landmarks)
`"HandDetection"`	Hand palm detection post-processor	For hand palm detection models
`"Segmentation"`	Semantic segmentation post-processor	For segmentation models

Inference Advanced Topics

Selecting Device Type for Inference

Every AI model in a model zoo is designed to work with a particular AI inference runtime (such as DeGirum N2X, Intel OpenVINO, Google TFLite etc.) and on a particular AI inference device, either on AI accelerator hardware or on host computer CPU.

The runtime and device type to be used for a model inference is defined by the combination of RuntimeAgent and DeviceType model attributes of ModelParams class as returned by degirum.model.Model.model_info property. You may query those attributes or you may query degirum.model.Model.device_type property to get the runtime/device type pair in a form of "/" (second way is more convenient).

Some models may support multiple runtime/device combinations - such models are called multi-device models. For such models the list of supported device types can be obtained by querying SupportedDeviceTypes read-only model attribute of ModelParams class. This attribute is a string containing comma-separated list of runtime/device type combinations supported by the model. For example, the string "OPENVINO/CPU,ONNX/CPU" means that the model can be run on both Intel OpenVINO and Microsoft ONNX runtimes using CPU as a hardware device. The SupportedDeviceTypes attribute is defined only for multi-device models.

When you load a model from a model zoo you also implicitly specify the AI inference engine to be used for this model inference. It can be cloud platform, AI server, or local PySDK installation. That inference engine has its own capabilities in terms of supported runtimes and devices, which may be different from the model capabilities. In order to obtain the list of runtime/device combinations supported by both the model and the inference engine, the model is loaded for, you can query degirum.model.Model.supported_device_types read-only property. This property is a list which contains the intersection of two sets: the set of runtime/device combinations supported by the model itself, and the set of runtime/device combinations supported by the inference engine, the model is loaded for. Each element of this list is the runtime/device pair in a form "/".

For multi-device models you may change the desired runtime/device combination to be used for the inference at any time. You do it by assigning the desired combination to the degirum.model.Model.device_type property. You may assign only combinations which occur in the degirum.model.Model.supported_device_types list.

Selecting Devices for Inference

Every AI model in a model zoo is designed to work on a particular hardware, either on AI accelerator hardware such as DeGirum Orca, or on host computer CPU. Imagine the situation when the host computer is equipped with multiple hardware devices of a given type, and you run multiple inferences of a model designed for this device type. In this case by default all available hardware devices of this type will be used for this model inferences. This guarantees top inference performance in the case of single model running on all available devices.

In certain cases you may want to limit the model inference to particular subset of available devices. For example, you have two devices and you want to run concurrent inference of two models. In default case both devices would be used for both model inferences causing the models to be reloaded to devices each time you run the inference of another model. Even if the model loading for DeGirum Orca devices is extremely fast, it still may cause performance degradation. In this case you may want to run the first model inference only on the first device, and the second model inference only on the second device. To do so you need to assign degirum.model.Model.devices_selected property of each model object to contain the list of device indexes you want your model to run on. In our example you need to assign the list [0] to the devices_selected property of the first model object, and the list [1] to the second model object.

To get the information about available devices, you query degirum.model.Model.devices_available property. It returns the list of device indexes of all available devices of the type this model is designed for. Those indexes are zero-based, so if your host computer has a single device of a given type, the returned list would contain single zero element: [0]. In case of two devices it will be [0, 1] and so on.

Note: since the inference device assignment for cloud inferences is performed dynamically, and actual AI farm node configuration is unknown until the inference starts, the devices_available property for such use case always returns the full list of available devices.

In general, the list you assign to the devices_selected property should contain only indexes occurred in the list returned by the devices_available property.

Handling Multiple Streams of Frames

The Model class interface has a method, degirum.model.Model.predict_batch, which can run multiple predictions on a sequence of frames. In order to deliver the sequence of frames to the predict_batch you implement an iterable object, which returns your frames one-by-one. One example of iterable object is a regular Python list, another example is a function, which yields frame data using yield statement. Then you pass such iterable object as an argument to the predict_batch method. In turn, the predict_batch method returns a generator object, which yields prediction results using yield statement.

All the inference magic with pipelining sequential inferences, asynchronously retrieving inference results, supporting various inference devices, and AI server vs. local operation modes happens inside the implementation of predict_batch method. All you need to do is to wrap your sequence of frame data in an iterable object, pass this object to predict_batch, and iterate over the generator object returned by predict_batch using either for-loop or by repeatedly calling next() built-in function on this generator object.

The following example runs the inference on an infinite sequence of frames captured from the camera:

import cv2 # OpenCV
stream = cv2.VideoCapture(0) # open video stream from local camera #0

def source(): # define iterator function, which returns frames from camera
   while True:
      ret, frame = stream.read()
      yield frame

for result in model.predict_batch(source()): # iterate over inference results
   cv2.imshow("AI camera", res.image_overlay) # process result

But what if you need to run multiple concurrent inferences of multiple asynchronous data streams with different frame rates? The simple approach when you combine two generators in one loop either using zip() built-in function or by manually calling next() built-in function for every generator in a loop body will not work effectively.

Non-working example 1. Using zip() built-in function:

batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
for result1, result2 in zip(batch1, batch2)
   # process result1 and result2

Non-working example 2. Using next() built-in function:

batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
while True:
   result1 = next(batch1)
   result2 = next(batch2)
   # process result1 and result2

The reason is that the Python runtime has Global Interpreter Lock (GIL), which allows running only one thread at a time blocking the execution of other threads. So if the currently running thread is itself blocked by waiting for the next frame or waiting for the next inference result, all other threads are blocked as well.

For example, if the frame rate of source1() is slower than the frame rate of source2() and assuming that the model inference frame rates are higher than the corresponding source frame rates, then the code above will spend most of the time waiting for the next frame from source1(), not letting frames from source2() to be retrieved, so the model2 will not get enough frames and will idle, losing performance.

Another example is when the inference latency of model1 is higher than the inference queue depth expressed in time (this is the product of the inference queue depth expressed in frames and the single frame inference time). In this case when the model1 inference queue is full, but inference result is not ready yet, the code above will block on waiting for that inference result inside next(batch1) preventing any operations with model2.

To get around such blocks the special non-blocking mode of batch predict operation is implemented. You turn on this mode by assigning True to degirum.model.Model.non_blocking_batch_predict property.

When non-blocking mode is enabled, the generator object returned by predict_batch() method accepts None from the input iterable object. This allows you to design non-blocking frame data source iterators: when no data is available, such iterator just yields None without waiting for the next frame. If None is returned from the input iterator, the model predict step is simply skipped for this iteration.

Also in non-blocking mode when no inference results are available in the result queue at some iteration, the generator yields None result. This allows to continue execution of the code which operates with another model.

In order to operate in non-blocking mode you need to modify your code the following way:

Modify frame data source iterator to return None if no frame is available yet, instead of waiting for the next frame.
Modify inference loop body to deal with None results by simply skipping them.

Measure Inference Timing

The degirum.model.Model class has a facility to measure and collect model inference time information. To enable inference time collection assign True to degirum.model.Model.measure_time property.

When inference timing collection is enabled, the durations of individual steps for each frame prediction are accumulated in internal statistic accumulators.

To reset time statistic accumulators you use degirum.model.Model.reset_time_stats method.

To retrieve time statistic accumulators you use degirum.model.Model.time_stats method. This method returns a dictionary with time statistic objects. Each time statistic object accumulates time statistics for particular inference step over all frame predictions happened since the timing collection was enabled or reset. The statistics includes minimum, maximum, average, and count. Inference steps correspond to dictionary keys. The following dictionary keys are supported:

Key	Description
`FrameTotalDuration_ms`	Frame total inference duration from the moment when you invoke predict method to the moment when inference results are returned
`PythonPreprocessDuration_ms`	Duration of client-side pre-processing step including data loading time and data conversion time
`CorePreprocessDuration_ms`	Duration of server-side pre-processing step
`CoreInferenceDuration_ms`	Duration of server-side AI inference step
`CoreLoadResultDuration_ms`	Duration of server-side data movement step
`CorePostprocessDuration_ms`	Duration of server-side post-processing step
`CoreInputFrameSize_bytes`	The size of received input frame

For DeGirum Orca AI accelerator hardware additional dictionary keys are supported:

Key	Description
`DeviceInferenceDuration_ms`	Duration of AI inference computations on AI accelerator IC excluding data transfers
`DeviceTemperature_C`	Internal temperature of AI accelerator IC in C
`DeviceFrequency_MHz`	Working frequency of AI accelerator IC in MHz

The time statistics object supports pretty-printing so you can directly print it using regular print() statement. For example, the output of the following statement:

print(model.time_stats()["PythonPreprocessDuration_ms"])

... will look like this:

PythonPreprocessDuration_ms   ,    6.00,    8.27,   10.82,     25

It consists of the inference step name (PythonPreprocessDuration_ms in this case) followed by four statistic values presented in a format minimum, average, maximum, count.

You may print the whole table of statistics using the following code:

print(model.time_stats())

The output will look like this:

Statistic                     ,     Min,     Avg,     Max,    Cnt
PythonPreprocessDuration_ms   ,    7.52,   10.09,   17.07,     25
CoreInferenceDuration_ms      ,   12.81,   13.86,   19.06,     25
CoreLoadResultDuration_ms     ,    0.20,    0.27,    0.58,     25
CorePostprocessDuration_ms    ,    0.97,    1.09,    1.74,     25
CorePreprocessDuration_ms     ,   10.91,   12.08,   18.07,     25
DeviceInferenceDuration_ms    ,    6.32,    6.36,    6.39,     25
FrameTotalDuration_ms         ,   40.70,   54.68,  234.25,     25

Note: In batch prediction mode many inference phases are pipelined so the pre- and post-processing steps of one frame may be executed in parallel with the AI inference step of another frame. Therefore actual frame rate may be higher than the frame rate calculated by FrameTotalDuration_ms statistic.

Note: PythonPreprocessDuration_ms statistic includes data loading time and data conversion time. This can give very different results for different ways of loading input frame data. For example, if you provide image URLs for inference, then the PythonPreprocessDuration_ms will include image downloading time, which can be much higher compared with the case when you provide the image as numpy array, which does not require any downloading.

The following example shows how to use time statistics collection interface. It assumes that the model variable is the model created by load_model().

model.measure_time = True # enable accumulation of time statistics

# perform batch prediction
for result in model.predict_batch(source()):
   # process result
   pass

stats = model.time_stats() # query time statistics dictionary

# pretty-print frame total inference duration statistics
print(stats["FrameTotalDuration_ms"])

# print average duration of AI inference step
print(stats["CoreInferenceDuration_ms"].avg)

model.reset_time_stats() # reset time statistics accumulators

# perform one more batch prediction
for result in model.predict_batch(source()):
   # process result
   pass

# print statistics of Python pre-processing step
print(stats["PythonPreprocessDuration_ms"].max)

Custom Post Processors for Inference Results

When you want to work with some new AI model and PySDK does not yet provide post-processor class to interpret model results, then you may want to implement that post-processing code yourself.

Such code typically takes the AI model output tensor data and interprets that raw tensor data to produce some meaningful results like bounding boxes, probabilities, etc. Then it renders these results on a top of original image to produce so-called image overlay.

PySDK provides a way to seamlessly integrate such custom post-processing code so it will behave exactly like built-in post-processors. To do so, you need to complete the following two steps:

Implement your own custom post-processor class.
Instruct AI model object to use your custom post-processing class instead of built-in post-processor.

To better understand, how post-processing is organized in PySDK, we need to disclose some implementation details. First of all, the built-in post-processing is actually split into two parts, two distinct pieces of code, which are executed in different places:

Low-level post-processor, which performs raw tensor data conversion into a JSON array. The format of this array is model-specific, and all supported formats are described here. This post-processor is invoked on the AI server side.
PySDK-level post-processor, which performs AI result rendering on a top of the original image, generating image overlay. This post-processor is invoked on the client side.

Your custom post-processor class should actually implement both parts: raw tensor conversion and image overlay generation. It must inherit degirum.postprocessor.InferenceResults base class or any class derived from degirum.postprocessor.InferenceResults. If you want to reuse some image overlay generation functionality of built-in post-processor classes, you may inherit one of them.

The following is the list of PySDK-level post-processor classes, which you may inherit in order to reuse AI result rendering code, or, another words, the implementation of image_overlay method.

Post-Processor Class	Description
degirum.postprocessor.ClassificationResults	Post-processor, which renders classification results
degirum.postprocessor.DetectionResults	Post-processor, which renders object detection results, including pose detection and face detection
degirum.postprocessor.Hand_DetectionResults	Post-processor, which renders hand detection results
degirum.postprocessor.SegmentationResults	Post-processor, which renders segmentation results

Unfortunately, the code, which transforms raw tensor data (the low-level post-processor), you need to develop yourself, since the format of the raw tensors for new models is usually unique.

Your custom post-processor class may override the following methods of degirum.postprocessor.InferenceResults base class:

Method to Override	Description
init()	Constructor.
str()	Conversion of inference results to string.
image_overlay	Rendering image overlay.

Typically, you do not need to override any other methods/properties of the base class: their default implementations are generic enough.

The code of transforming raw tensor data into human-friendly results you need to implement in the constructor. The typical implementation of such constructor is the following:

import degirum as sg

class MyResultProcessor(dg.postprocessor.InferenceResults):

   def __init__(self, *args, **kwargs):
      super().__init__(*args, **kwargs) # call base class constructor first

      # at this point self._inference_results contains the list or raw output tensors

      new_results = [] # you define empty list for new human-friendly results

      # you iterate over tensors
      for tensor in self._inference_results:
         # and convert them into a list of dictionaries, where each dictionary represents one detected entity
         detected_entity = {}
         new_results.append(detected_entity) # append detected entity to the list of human-friendly results

      # finally you replace self._inference_results with new list of human-friendly results
      self._inference_results = new_results

Basically, you need to process all raw tensors contained in the self._inference_results list and convert them into a list of human-friendly results. Then you substitute self._inference_results with this new list of human-friendly results. The element of such list should be a dictionary. The format of this dictionary you may chose as you see fit. But if you chose one of existing formats, then you may reuse one of existing image overlay generation implementations. In this case you inherit your custom post-processor class from the PySDK post-processor class, which result format you decided to reuse, and do not override image_overlay property.

Each raw tensor in the self._inference_results list is represented by a dictionary. The format of the raw tensor dictionary is the following:

Key Name	Description	Data Type
`"id"`	Tensor numeric ID as specified in the model	integer
`"name"`	Tensor name as specified in the model	string
`"shape"`	Tensor shape: sizes of each dimension	integer list
`"quantization"`	Tensor quantization parameters	dictionary, see below
`"type"`	Tensor element type	string, see below
`"data"`	Tensor data buffer contents	multi-dimensional numpy array

The following tensor data types are supported:

Type String	Type Description
`"DG_FLT"`	32-bit floating point
`"DG_UINT8"`	8-bit unsigned integer

The following is the structure of quantization dictionary:

Json Field Name	Description	Data Type
`"axis"`	Quantization axis or -1 for global quantization	integer
`"scale"`	Quantization scale array	floating point list
`"zero"`	Quantization zero offset	integer list

Another typical task you need to perform in your custom post-processor class constructor is to convert coordinates of detected entities from AI model input image coordinates to original image coordinates. This can be done by invoking a function stored in the self._conversion property. You pass a tuple of (x,y) coordinates in respect to the AI model input image, and it returns a tuple of (x,y) coordinates in respect to the original image.

If you decided to define completely new format of human-friendly results, then you will need to implement image_overlay property.

The typical implementation of image_overlay property is the following:

import degirum as sg

class MyResultProcessor(dg.postprocessor.InferenceResults):

    @property
    def image_overlay(self):
      # create drawing object to avoid using graphical backends directly
      draw = create_draw_primitives(self._input_image, self._alpha, self._font_scale)

      # create a set of colors to be used for drawing different classes of objects
      current_color_set = itertools.cycle(
         self._overlay_color
         if isinstance(self._overlay_color, list)
         else [self._overlay_color]
      )

      # iterate over all inference results created in your constructor
      for res in self._inference_results:
         # and draw AI annotations over the original image, which is stored inside `draw` object
         # use draw.draw_text() to print a text string
         # use draw.draw_circle() to draw a circle
         # use draw.draw_line() to draw a line
         # use draw.draw_box() to draw a rectangle
         # use next(current_color_set) to obtain the color for the next object class
         # check `self._show_labels` to draw or not to draw text labels
         # check `self._show_probabilities` to draw or not to draw probabilities

      # return image overlay
      return draw.image_overlay()

Basically, you iterate over all detected entities and draw them on the original image. To harmonize the behavior of your code with PySDK standards you may follow the following practices:

Use create_draw_primitives() method to create a drawing object, which will handle all drawing tasks using proper graphical backend, as selected for the model.
Check self._show_labels property to draw or not to draw text labels.
Check self._show_probabilities property to draw or not to draw probabilities.
Use draw object methods like draw.draw_text(), draw.draw_circle(), draw.draw_line(), draw.draw_box() to draw various geometric shapes.
Use self._overlay_color property to obtain colors for your classes of objects. The example above demonstrates, how to do it.
Return draw.image_overlay() at the end.

When your new custom post-processor class is ready, you need to instruct AI model object to use your custom post-processing class instead of built-in post-processor. Do it in two steps:

Assign "None" to degirum.model.Model._model_parameters.OutputPostprocessType property to disable any low-level server-side post-processing.
Assign your custom class to degirum.model.Model.custom_postprocessor property to attach your custom post-processor.

model = zoo.load_model(model_name) # load model
model._model_parameters.OutputPostprocessType = "None"
model.custom_postprocessor = MyResultProcessor

You need to do these steps before the very first inference. From now on each model inference result returned by model prediction methods called from that model object will be of MyResultProcessor type.