Running AI Model Inference
Once you loaded an AI model and obtained model handling object, you can start doing AI inferences on your model. The following methods of degirum.model.Model class are available to perform AI inferences:
- degirum.model.Model.predict and
degirum.model.Model.__call__
to run prediction of a single data frame - degirum.model.Model.predict_batch to run prediction of a batch of frames
- degirum.model.Model.predict_dir to run prediction of multiple files in a directory
The predict()
and __call__
methods behave exactly the same way (actually, __call__
just calls predict()
).
They accept single argument - input data frame, perform AI inference of that data frame, and return inference result -
an object derived from degirum.postprocessor.InferenceResults superclass.
The batch prediction methods, predict_batch()
and predict_dir()
, perform predictions of multiple frames in a
pipelined manner, which is more efficient than just calling predict()
method in a loop.
These methods are described in details in Batch Inferences section.
Cloud Server Connection Issues
For cloud inferences the connection to the cloud server is performed at the beginning of each predict-style method call,
and disconnection is performed at the end of that call. This greatly reduces performance since the cloud server
connection/disconnection is relatively long activities. To overcome this problem you may use the model object inside
the with
block. When predict-style method is called inside the with
block, the disconnection is not performed
at the end of such call, so consecutive predict call does not perform reconnection as well, thus saving execution time.
The following code demonstrates the approach:
# here model_name variable stores the model name
# and data_list variable stores the list of input data frames to process
# wrap the load_model() call in with block to avoid disconnections
with zoo.load_model(model_name) as model:
# perform prediction loop
for data in data_list:
result = model.predict(data)
Input Data Handling
PySDK model prediction methods support different types of input data. An exact input type depends on the model to be used. The following input data types are supported:
- image input data
- audio input data
- raw tensor input data
The input data object you supply to model predict methods also depends on the number of inputs the model has. If the model has single data input, then the data objects you pass to model predict methods are single objects. IN rare cases the model may have multiple data inputs; in this case the data objects you pass to model predict methods are lists of objects: one object per corresponding input.
The number and the type of inputs of the model are described by the InputType
property of the ModelParams
class
returned by degirum.model.Model.model_info property (see Model Info section for details about model
info properties). The InputType
property returns the list of input data types, one type per model input.
So the number of model inputs can be deduced by evaluating the length of the list returned by the InputType
property.
The following sections describe details of input data handling for various model input types.
Image Input Data Handling
When dealing with model inputs of image type (InputType
is equal to "Image"
), the PySDK model prediction
methods accept a wide variety of input data frame types:
- the input frame can be the name of a file with frame data;
- it can be the HTTP URL pointing to a file with frame data;
- it can be a numpy array with frame data;
- it can be a PIL
Image
object; - it can by
bytes
object containing raw frame data.
An AI model requires particular input tensor dimensions and data type which, in most of the cases, does not match the dimensions of the input frame. In this case, PySDK performs automatic conversion of the input frame to the format compatible with AI model input tensor, performing all the necessary conversions such as resizing, padding, colorspace conversion, and data type conversion.
PySDK performs input frame transformations using one of the two graphical packages (called backends): PIL or OpenCV.
The backend is selected by degirum.model.Model.image_backend property. By default it is set to auto
, meaning
that OpenCV backend will be used first, and if it is not installed, then PIL backend will be used.
You may explicitly select which backend to use by assigning either "pil"
or "opencv"
to
degirum.model.Model.image_backend property.
Note: In case of OpenCV backend, you cannot pass PIL Image objects to model predict methods.
If your input frame is in the file on a local filesystem, or is accessible through HTTP protocol, pass the filename string or URL string directly to model predict methods: PySDK will (down-)load the file, decode it, and convert to the model input tensor format. The set of supported graphical file formats is defined solely by the graphical backend library you selected, PIL or OpenCV - PySDK does not perform any own decoding.
Sometimes, image conversion to AI model input tensor format requires image resizing. This resizing can be done in two possible ways:
- preserving the aspect ratio;
- not preserving the aspect ratio.
In addition, the image can be cropped or not cropped.
You can control the way of image resizing by degirum.model.Model.input_pad_method property,
which can have one of the following values: "stretch"
, "letterbox"
, "crop-first"
, and "crop-last"
.
When you select the "stretch"
method, the input image is resized exactly to the AI model input tensor dimensions,
possibly changing the aspect ratio.
When you select the "letterbox"
method (default way), the image is resized to fit the AI model input tensor
dimensions while preserving the aspect ratio. The voids which can appear on the image sides are filled with the color
specified by degirum.model.Model.input_letterbox_fill_color property (black by default).
When you select the "crop-first"
method, the image is first cropped to match the AI model input tensor aspect ratio
with respect to the degirum.model.Model.input_crop_percentage and then resized to match the AI model input tensor
dimensions. For example: an input image with dimensions 640x480 going into a model with input tensor dimensions 224x224
with crop percentage of 0.875 will first be center cropped to 420x420 (420 = min
(640, 480) * 0.875)
and then resized to 224x224.
When you select "crop-last"
method, if the AI model input tensor dimensions are equal (square), the image is resized
with its smaller side equal to the model dimension with respect to degirum.model.Model.input_crop_percentage.
If the AI model input tensor dimensions are not equal (rectangle), the image is resized with stretching to the input
tensor dimensions with respect to degirum.model.Model.input_crop_percentage. The image is then cropped to fit the
AI model input tensor dimensions and aspect ratio. For example: an input image with dimensions 640x480 going into a
model with input tensor dimensions of 224x224 with crop percentage of 0.875 will first be resized to 341x256
(256 = 224 / 0.875 and 341 = 256 * 640 / 480) and then center cropped to 224x224. Alternatively an input image
with dimensions 640x480 and a model with input tensor dimensions 280x224 will be resized to 320x256
(320 = 280 / 0.875 and 256 = 224 / 0.875) and then center cropped to 280x224.
You can specify the resize algorithm in the degirum.model.Model.input_resize_method property, which may have the
following values: "nearest"
, "bilinear"
, "area"
, "bicubic"
, or "lanczos"
.
These values specify various interpolation algorithms used for resizing.
In case your input frames are stored in numpy arrays, you may need to tell PySDK the order of colors in those numpy arrays: RGB or BGR. This order is called the colorspace. By default, PySDK treats numpy arrays as having BGR colorspace for OpenCV graphical backend and RGB colorspace for PIL graphical backend. So if your numpy arrays as such, then no additional action is needed from your side. But if your numpy arrays have color order opposite to default, then you need to change degirum.model.Model.input_numpy_colorspace property.
Note: If a model has multiple image inputs, the PySDK applies the same input_***
image properties as discussed above
for every image input of a model.
Audio Input Data Handling
When dealing with model inputs of audio type (InputType
is equal to "Audio"
), PySDK does not perform any
conversions of the input data: it expects numpy 1-D array with audio waveform samples of proper size and with proper
sampling rate. The waveform size should be equal to InputWaveformSize
model info property. The waveform sampling
rate should be equal to InputSamplingRate
model info property. And finally the data element type should be equal
to the data type specified by the InputRawDataType
model info property.
All aforementioned model info properties are the properties of the ModelParams
class returned by
degirum.model.Model.model_info property (see Model Info section for details).
Tensor Input Data Handling
When dealing with model inputs of raw tensor type (InputType
is equal to "Tensor"
), PySDK expects that you
provide a multi-dimensional numpy array of proper dimensions. The dimensions of that array should match model input
dimensions as specified by either InputShape
model info property (new style models) or the by InputN
, InputH
, InputW
,
and InputC
model model info properties (old style models), slowest dimension first. InputShape
model info property
supersedes other model info properties, so if it is defined, all others are ignored. If it is not defined, then the tensor
shape is specified by the set of defined and non-zero InputN
, InputH
, InputW
, and InputC
model info properties
in that order: InputN
specifies the slowest dimension, InputC
specifies the fastest dimension.
The numpy array data element type should be equal to the data type specified by the InputRawDataType
model info property
(see Model Info section for details).
Dynamic Inputs Handling
Certain AI models may have so-called dynamic inputs, which are supported by some of inference runtimes, for example, OpenVINO runtime (more details by this link).
In order to adjust the size of the input data, accepted by PySDK preprocessor, you need to assign the actual input data size/shape to be used for consecutive inferences before performing the inference.
If your model has image input type (InputType == "Image"
), then you assign InputN
, InputH
, InputW
, and InputC
model info properties
to match the size of images to be used for the inference. The PySDK preprocessor will resize the input images to assigned size.
If input images already have that size, resizing step will be skipped. In any case, the inference runtime will receive the image
of that size.
If your model has tensor input type (InputType == "Tensor"
), then you assign InputShape
model info property
to match the shape of tensors to be used for the inference. Since PySDK does not do any resizing for tensor inputs,
all tensors you pass for inferences must have the specified shape, so the inference runtime will receive the tensors
of that shape.
To simplify input shape assignments, you may use degirum.model.Model.input_shape model property.
This property allows unified access to model input size/shape model info properties: InputN
, InputH
, InputW
, InputC
,
and InputShape
regardless of the input type (image or tensor).
The getter returns and the setter accepts the list of input shapes, one shape per each model input. Each element of that list (which defines a shape for particular input) is another list containing input dimensions, slowest dimension first.
For each input, the getter returns InputShape
value if InputShape
model parameter is specified for the input, otherwise
it returns [InputN, InputH, InputW, InputC]
.
The setter works symmetrically: it assigns the provided list to InputShape
model info property, if it was specified for
the model input, otherwise it assigns provided list to InputN
, InputH
, InputW
, and InputC
model info properties
in that order (i.e. zero index to InputN
and so forth).
Example: you have a dynamic-input model with single image-type input, and you want to perform
an inference of a video stream frames with resolution specified by width
and height
variables, so
that the dynamic-input model itself will do image handling, and no resizing will be done by PySDK. Then
you need to perform the following assignment before performing any inference:
Inference Results
All model predict methods return result objects derived from degirum.postprocessor.InferenceResults class. These result classes are called post-processors. Particular post-processor class types depend on the AI model type: classification, object detection, pose detection, segmentation etc. But from the user point of view, they deliver identical functionality.
Result object contains the following data:
-
degirum.postprocessor.InferenceResults.image property keeps original image;
-
degirum.postprocessor.InferenceResults.image_overlay property keeps original image with inference results drawn on a top; the type of such drawing is model-dependent:
-
for classification models, the list of class labels with probabilities is printed below the original image;
- for object detection models, bounding boxes of detected object are drawn on the original image;
- for hand and pose detection models, detected keypoints and keypoint connections are drawn on the original image;
-
for segmentation models, detected segments are drawn on the original image;
-
degirum.postprocessor.InferenceResults.results property keeps a list of numeric results (follow the property link for detailed explanation of all result formats);
-
degirum.postprocessor.InferenceResults.image_model property keeps the binary array with image data converted to AI model input specifications. This property is assigned only if you set degirum.model.Model.save_model_image model property before performing predictions.
The results
property is what you typically use for programmatic access to inference results. The type of results
is
always a list of dictionaries, but the format of those dictionaries is model-dependent.
Also, if the result contains coordinates of objects, all such coordinates are recalculated from the model coordinates
back to coordinates on the original image, so you can use them directly.
The image_overlay
property is very handy for debugging and troubleshooting. It allows you to quickly assess the
correctness of the inference results in graphical form.
There are result properties which affect how the overlay image is drawn:
- degirum.postprocessor.InferenceResults.overlay_alpha: transparency value (alpha-blend weight) for all overlay details;
- degirum.postprocessor.InferenceResults.overlay_font_scale: font scaling factor for overlay text;
- degirum.postprocessor.InferenceResults.overlay_line_width: line width in pixels for overlay lines;
- degirum.postprocessor.InferenceResults.overlay_color: RGB color tuple or list of RGB color tuples for drawing all overlay details;
- degirum.postprocessor.InferenceResults.overlay_show_labels: flag to enable drawing class labels of detected objects;
- degirum.postprocessor.InferenceResults.overlay_show_probabilities: flag to enable drawing probabilities of detected objects;
- degirum.postprocessor.InferenceResults.overlay_fill_color: RGB color tuple for filling voids which appear due to letterboxing.
When each individual result object is created, all these overlay properties (except overlay_fill_color
) are assigned
with values of similarly named properties taken from the model object (see
Model Parameters section for the list of model properties). This allows assigning overlay
property values only once and applying them to all consecutive results. But if you want to play with individual result,
you may reassign any of overlay properties and then re-read image_overlay
property. Each time you read
image_overlay
, it returns new image object freshly drawn according to the current values of overlay properties.
The overlay_color
property is used to define the color to draw overlay details. In the case of a single RGB tuple,
the corresponding color is used to draw all the overlay data: points, boxes, labels, segments, etc.
In the case of a list of RGB tuples the behavior depends on the model type.
- For classification models different colors from the list are used to draw labels of different classes.
- For detection models different colors are used to draw labels and boxes of different classes.
- For pose detection models different colors are used to draw keypoints of different persons.
- For segmentation models different colors are used to highlight segments of different classes.
If the list size is less than the number of classes of the model, then overlay_color
values are used cyclically,
for example, for three-element list it will be overlay_color[0]
, then overlay_color[1]
, overlay_color[2]
,
and again overlay_color[0]
.
The default value of overlay_color
is a single RBG tuple of yellow color for all model types except detection and segmentation models.
For detection and segmentation models it is the list of RGB tuples with the list size equal to the number of model classes.
You can use Model.label_dictionary
property to obtain a list of model classes.
Each color is automatically assigned to look pretty and different from other colors in the list.
Note"
overlay_fill_color
is assigned with degirum.model.Model.input_letterbox_fill_color.
In some cases the set of existing PySDK post-processor classes is not enough. This happens, for example, when you want to work with new AI model, and PySDK does not support this AI model yet. In this case you may implement your own custom post-processor and use it instead of standard PySDK post-processors. The Custom Post Processors for Inference Results section describes this in detail.
Results Filtering
By default, all results are reported by the model predict methods. However, you may want to include only results which belong to certain categories: either have certain class labels or category IDs. To achieve that, you can specify a set of class labels (or, alternatively, category IDs) so only inference results, which class labels (or category IDs) are found in that set, are reported, and all other results are discarded. You assign such a set to degirum.model.Model.output_class_set property.
For example, you may want to include only results with class labels "car" and "truck":
Or you may want to include only results with category IDs 1 and 3:
This category filtering is applicable only to models which have "label"
(or "category_id"
) keys in
their result dictionaries. For all other models this category filter will be ignored.
Note On degirum.postprocessor.DetectionResults Segmentation Mask Format
The degirum.postprocessor.DetectionResults class expects the value at the optional "mask"
key in an
object detection model's result dictionary to hold a dictionary representing the run-length encoded (RLE)
object segmentation mask array. This dictionary contains the following keys:
"height"
: height of segmentation mask array"width"
: width of segmentation mask array"data"
: string representation of a buffer of unsigned 32-bit integers carrying the RLE segmentation mask array.
The "data"
field is obtained using the following algorithm:
- The initial mask array, flattened in row-major order, is broken up into sub-arrays, each of which has elements of only one value.
- The value in each sub-array is stored in one array and the length of each sub-array is stored in another array.
- The resulting two arrays are concatenated - values, then lengths.
- The final array is cast as an unsigned 32-bit integer array, interpreted as a buffer, encoded using base64, and decoded as an ASCII string.
The algorithm to convert the dictionary with the RLE segmentation mask array to the original array is the following:
- The string is decoded using base64 and interpreted as an array of unsigned 32-bit integers.
- The resulting array is split into two equal-sized arrays: the first array holds the value present in each sub-array of the original flattened array, and the second array holds the length of each sub-array.
- The original flattened array is reconstructed using the two arrays.
- The resulting array is reshaped according to the values at the
"height"
and"width"
keys in the dictionary.
Batch Inferences
If you need to process multiple frames using the same model and the same settings, the most effective way to do it is to use batch prediction methods of `degirum.model.Model' class:
- degirum.model.Model.predict_batch method to run predictions on a list of frames;
- degirum.model.Model.predict_dir method to run predictions on files in a directory.
Both methods perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling
predict()
method in a loop.
Both methods return the generator object, so you can iterate over inference results. This allows you to directly use the result of batch prediction methods in for-loops, for example:
Note: Since batch prediction methods return generator object, simple assignment of batch prediction method result to some variable does not start any inference. Only iterating over that generator object does.
The predict_batch
method accepts single parameter: an iterator object, for example, a list. You populate your
iterator object with the same type of data you pass to regular predict()
, i.e. input image path strings,
input image URL string, numpy arrays, or PIL Image objects (in case of PIL image backend).
The predict_dir
method accepts a filepath to a directory containing graphical files for inference.
You may supply optional extensions
parameter passing the list of file extensions to process.
The following minimal example demonstrates how to use batch predict to perform AI inference from the video file:
import degirum as dg
import cv2
# connect to cloud model zoo, load model, set model properties needed for OpenCV
zoo = dg.connect(dg.CLOUD, "https://cs.degirum.com", token="<your cloud API access token>")
model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")
model.image_backend = "opencv"
stream = cv2.VideoCapture("images/Traffic.mp4") # open video file
# define generator function to produce video frames
def frame_source(stream):
while True:
ret, frame = stream.read()
if not ret:
break # end of file
yield frame
# run batch predict on stream of frames from video file
for result in model.predict_batch(frame_source(stream)):
# show annotated frames
cv2.imshow("Demo", res.image_overlay)
stream.release()
Model Parameters
The model behavior can be controlled with various Model
class properties, which define model parameters.
They can be divided into the following categories:
- parameters, which control how to handle input frames;
- parameters, which control the inference;
- parameters, which control how to display inference results;
- parameters, which control model run-time behavior and provide access to model information
The following table provides complete summary of Model
class properties arranged by categories.
Property Name | Description | Possible Values | Default Value |
---|---|---|---|
Input Handling Parameters | |||
image_backend |
package to be used for image processing | "auto" , "pil" , or "opencv" "auto" tries OpenCV first |
"auto" |
input_letterbox_fill_color |
image fill color in case of 'letterbox' padding | 3-element tuple of RGB color | (0,0,0) |
input_numpy_colorspace |
colorspace for numpy arrays | "auto" , "RGB" or "BGR" |
"auto" |
input_pad_method |
how input image will be padded when resized | "stretch" , "letterbox" , "crop-first" , or "crop-last" |
"letterbox" |
input_crop_percentage |
a percentage of input image dimension to retain when "input_pad_method" is set to "crop-first" or "crop-last" |
Float value in [0..1] range | 1.0 |
input_resize_method |
interpolation algorithm for image resizing | "nearest" , "bilinear" , "area" , "bicubic" , "lanczos" |
"bilinear" |
input_shape |
input shape | list of input shapes, one shape per each model input; each element is another list containing input dimensions, slowest dimension first | [[]] |
save_model_image |
flag to enable/disable saving of model input image in inference results | Boolean value | False |
Inference Parameters | |||
output_class_set |
List of class labels or category IDs to be included in inference results | Set of strings or set of integers | {} |
output_confidence_threshold |
confidence threshold to reject results with low scores | Float value in [0..1] range | 0.1 |
output_max_detections |
maximum number of objects to report for detection models | Integer value | 20 |
output_max_detections_per_class |
maximum number of objects to report for each class for detection models | Integer value | 100 |
output_max_classes_per_detection |
maximum number of classes to report for detection models | Integer value | 30 |
output_nms_threshold |
rejection threshold for non-max suppression | Float value in [0..1] range | 0.6 |
output_pose_threshold |
rejection threshold for pose detection models | Float value in [0..1] range | 0.8 |
output_postprocess_type |
inference result post-processing type. You may set it to 'None' to bypass post-processing. |
String | Model-dependent |
output_top_k |
Number of classes with biggest scores to report for classification models. If 0 , report all classes above confidence threshold |
Integer value | 0 |
output_use_regular_nms |
use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models | Boolean value | False |
Display Parameters | |||
overlay_alpha |
transparency value (alpha-blend weight) for all overlay details | Float value in [0..1] range or "auto" ; 1 means no transparency; "auto" to select optimal transparency for the current model |
"auto" |
overlay_blur |
bounding box blurring option | None to disable; "all" to blur all classes; class label or list of class labels to blur only specified classes |
None |
overlay_color |
color for drawing all overlay details | 3-element tuple of RGB color or list of 3-element tuples of RGB color | (255,255,0) |
overlay_font_scale |
font scaling factor for overlay text | Positive float value | 1.0 |
overlay_line_width |
line width in pixels for overlay lines | Positive integer value | 3 |
overlay_show_labels |
flag to enable drawing class labels of detected objects | Boolean value | True |
overlay_show_probabilities |
flag to enable drawing probabilities of detected objects | Boolean value | False |
Control and Information Parameters | |||
supported_device_types |
list of supported device types in format <runtime>/<device> for this model (read-only) |
List of strings | N/A |
device_type |
device type to be used for AI inference of this model in a format <runtime>/<device> |
String | N/A |
devices_available |
list of inference device indexes which can be used for model inference (read-only) | List of integer values | N/A |
devices_selected |
list of inference device indexes selected for model inference | List of integer values | Equal to devices_available |
label_dictionary |
model class label dictionary (read-only) | Dictionary | N/A |
measure_time |
flag to enable measuring and collecting inference time statistics | Boolean value | False |
model_info |
model information object to provide read-only access to model parameters (read-only) | ModelParams object |
N/A |
non_blocking_batch_predict |
flag to control the blocking behavior of predict_batch() method |
Boolean value | False |
eager_batch_size |
The size of the batch to be used by device scheduler when inferencing this model. The batch is the number of consecutive frames before this model is switched to another model during batch predict. | Integer value in [1..80] range | 8 |
frame_queue_depth |
The depth of the model prediction queue. When the queue size reaches this value, the next prediction call will block until there will be space in the queue. | Integer value in [1..160] range | 80 for cloud inference, 8 for other cases |
Note: For segmentation models default value of
overlay_color
is the list of unique colors (RGB tuples). The size of the list is equal to the number of model classes. Uselabel_dictionary
property to get a list of models classes.
Model Info
AI models have a lot of static attributes defining various model features and characteristics. Unlike model properties, these attributes in most cases cannot be changed: they come with the model.
To access all model attributes, you may query read-only model property degirum.model.Model.model_info.
Note: New deep copy of model info class is created each time you read this property, so any changes made to this copy will not affect model behavior.
Model attributes are divided into the following categories:
- Device-related attributes
- Pre-processing-related attributes
- Inference-related attributes
- Post-processing-related attributes
The following table provides a complete summary of model attributes arranged by categories.
The Attribute Name column contains the name of the ModelParams
class member returned by the model_info
property.
Note: Each attribute in the Pre-Processing-Related Attributes group is a list of values, one per model input.
Attribute Name | Description | Possible Values |
---|---|---|
Device-Related Attributes | ||
DeviceType |
Device type to be used for AI inference of this model | "ORCA" : DeGirum Orca,"EDGETPU" : Google EdgeTPU,"GPU" : host GPU,"CPU" : host CPU,"NPU" : Intel NPU,"DLA" : Nvidia DLA,"RK3588" : Rockchip NPU 3588,"RK3568" : Rockchip NPU 3568,"RK3566" : Rockchip NPU 3566,"NXP_VX" : NXP VX,"NXP_ETHOSU" : NXP Ethos-U,"ARMNN" : ArmNN,"VITIS_NPU" : Ryzen NPU |
RuntimeAgent |
Type of runtime to be used for AI inference of this model | "N2X" : DeGirum NNExpress runtime,"TFLITE" : Google TFLite runtime,"OPENVINO" : Intel OpenVINO runtime,"ONNX" : Microsoft ONNX runtime,"TENSORRT" : Nvidia TensorRT runtime,"RKNN" : Rockchip RKNN Runtime |
SupportedDeviceTypes |
Comma-separated list of runtime agent/device type combinations, supported by the model | Example: "OPENVINO/CPU,ONNX/CPU" |
EagerBatchSize |
The size of the batch to be used by device scheduler when inferencing this model. The batch is the number of consecutive frames before this model is switched to another model during batch predict. | Integer number |
Pre-Processing-Related Attributes | ||
InputType |
Model input type | List of the following strings:"Image" : image input type,"Audio" : audio input type,"Tensor" : raw tensor input type |
InputShape |
Input tensor shape Applicable to "Tensor" input type |
List of integers |
InputN |
Input batch dimension size | 1 Other sizes to be supported |
InputH |
Input height dimension size | Integer number |
InputW |
Input width dimension size | Integer number |
InputC |
Input color dimension size | Integer number |
InputQuantEn |
Enable input frame quantization flag (set for quantized models) | Boolean value |
InputRawDataType |
Data element type for audio or tensor inputs | List of the following strings:"DG_UINT8" : 8-bit unsigned integer,"DG_INT16" : 16-bit signed integer,"DG_FLT" : 32-bit floating point |
InputTensorLayout |
Input tensor shape and layout | List of the following strings:"auto" : deduce tensor layout automatically"NHWC" : 4-D tensor frame-height-width-color"NCHW" : 4-D tensor frame-color-height-width |
InputColorSpace |
Input image colorspace (sequence of colors in C dimension) | List of the following strings:"RGB" , "BGR" |
InputScaleEn |
Enable global scaling of input data flag | List of boolean values |
InputScaleCoeff |
Scaling factor for input data global scaling; applied when InputScaleEn is enabled |
List of float values |
InputNormMean |
Mean value for per-channel input data normalization; applied when both InputNormMean and InputNormStd are not empty |
List of 3-element arrays of float values |
InputNormStd |
StDev value for per-channel input data normalization; applied when both InputNormMean and InputNormStd are not empty |
List of 3-element arrays of float values |
InputQuantOffset |
Quantization offset for input image quantization | List of float values |
InputQuantScale |
Quantization scale for input image quantization | List of float values |
InputWaveformSize |
Input waveform size in samples for audio input types | List of positive integer values |
InputSamplingRate |
Input waveform sampling rate in Hz for audio input types | List of positive float values |
InputResizeMethod |
Interpolation algorithm used for image resizing during model training | List of the following strings:"nearest" , "bilinear" , "area" , "bicubic" , "lanczos" |
InputPadMethod |
How input image was padded when resized during model training | List of the following strings:"stretch" , "letterbox" |
InputCropPercentage |
How much input image was cropped during model training | Float value in [0..1] range |
ImageBackend |
Graphical package used for image processing during model training | List of the following strings:"pil" , "opencv" |
Inference-Related Attributes | ||
ModelPath |
Path to the model JSON file | String with filepath |
ModelInputN |
Model frame dimension size | 1 Other sizes to be supported |
ModelInputH |
Model height dimension size | Integer number |
ModelInputW |
Model width dimension size | Integer number |
ModelInputC |
Model color dimension size | Integer number |
ModelQuantEn |
Enable input frame quantization flag (set for quantized models) | Boolean value |
Post-Processing-Related Attributes | ||
OutputNumClasses |
Number of classes model detects | Integer value |
OutputSoftmaxEn |
Enable softmax step in post-processing flag | Boolean value |
OutputClassIDAdjustment |
Class ID adjustment: number subtracted from the class ID reported by the model | Integer value |
OutputPostprocessType |
Post-processing type | See table below |
OutputConfThreshold |
Confidence threshold to reject results with low scores | Float value in [0..1] range |
OutputNMSThreshold |
Rejection threshold for non-max suppression | Float value in [0..1] range |
OutputTopK |
Number of classes with biggest scores to report for classification models | Integer number |
MaxDetections |
Maximum number of objects to report for detection models | Integer number |
MaxDetectionsPerClass |
Maximum number of objects to report for each class for detection models | Integer number |
MaxClassesPerDetection |
Maximum number of classes to report for detection models | Integer number |
UseRegularNMS |
Use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models | Boolean value |
The following table provides a list of supported post-processing types, their descriptions, and JSON result format reference. Please refer to degirum.postprocessor.InferenceResults.results for detailed description of JSON result format of each post-processor type as mentioned in the last column.
Label | Description | Applicable Result JSON Format |
---|---|---|
"Classification" |
Classification post-processor | For classification models |
"MultiLabelClassification" |
Multi-classifier classification post-processor | For multi-label classification models |
"Detection" |
MobilenetV2-style object detection post-processor | For object detection models |
"DetectionYolo" |
YOLOV5-style object detection post-processor | For object detection models |
"DetectionYoloV8" |
YOLOV8-style object detection post-processor | For object detection models |
"DetectionYoloV10" |
YOLOV10-style object detection post-processor | For object detection models |
"DetectionYoloPlates" |
YOLOV5-style license plate detection post-processor | For classification models |
"PoseDetection" |
Pose detection post-processor | For object detection models (with landmarks) |
"FaceDetect" |
Face detection post-processor | For object detection models (with landmarks) |
"HandDetection" |
Hand palm detection post-processor | For hand palm detection models |
"Segmentation" |
Semantic segmentation post-processor | For segmentation models |
Inference Advanced Topics
Selecting Device Types for Inference
Every AI model in a model zoo is designed to work with a particular AI inference runtime (such as DeGirum N2X, Intel OpenVINO, Google TFLite etc.) and on a particular AI inference device, either on AI accelerator hardware or on host computer CPU.
The runtime and device type to be used for a model inference is defined by the combination
of RuntimeAgent
and DeviceType
model attributes of ModelParams
class as returned by
degirum.model.Model.model_info property. You may query those attributes or you may query
degirum.model.Model.device_type property to get the runtime/device type pair in a form of
"
Some models may support multiple runtime/device combinations - such models are called
multi-device models. For such models the list of supported device types can be obtained by
querying SupportedDeviceTypes
read-only model attribute of ModelParams
class. This attribute
is a string containing comma-separated list of runtime/device type combinations supported by the model.
For example, the string "OPENVINO/CPU,ONNX/CPU"
means that the model can be run on both Intel OpenVINO
and Microsoft ONNX runtimes using CPU as a hardware device. The SupportedDeviceTypes
attribute is
defined only for multi-device models.
When you load a model from a model zoo you also implicitly specify the AI inference engine to be used
for this model inference. It can be cloud platform, AI server, or local PySDK installation. That inference
engine has its own capabilities in terms of supported runtimes and devices, which may be different
from the model capabilities. In order to obtain the list of runtime/device combinations supported
by both the model and the inference engine, the model is loaded for, you can query
degirum.model.Model.supported_device_types read-only property. This property is a list which
contains the intersection of two sets: the set of runtime/device combinations supported by the model
itself, and the set of runtime/device combinations supported by the inference engine, the model is
loaded for. Each element of this list is the runtime/device pair in a form "
If you wish to obtain the list of runtime/device combinations supported by the inference engine alone,
you may call degirum.get_supported_devices function, which accepts the inference engine designator
as a first argument. It also returns the list of supported device type strings in a form "
For multi-device models you may change the desired runtime/device combination to be used for the inference at any time. You do it by assigning the desired combination to the degirum.model.Model.device_type property. You may assign only combinations which occur in the degirum.model.Model.supported_device_types list.
It is possible to specify the list of device types for the inference. In this case the first supported deice type from that list will be set. This simplifies inference device assignment for multi-device models on a variety of systems with different sets of inference devices.
For example, you have a model, which supports all devices of OpenVINO runtime (NPU, GPU, and CPU) and you want to run this model on NPU, when it is available, otherwise on GPU, when it is available, and fallback to CPU if neither NPU, nor GPU are available. In this case you may do the following assignment:
Reading device_type
property back after list assignment will give you the actual device type assigned
for the inference.
Selecting Particular Device Instances for Inference
Every AI model in a model zoo is designed to work on a particular hardware, either on AI accelerator hardware such as DeGirum Orca, or on host computer CPU. Imagine the situation when the host computer is equipped with multiple hardware devices of a given type, and you run multiple inferences of a model designed for this device type. In this case by default all available hardware devices of this type will be used for this model inferences. This guarantees top inference performance in the case of single model running on all available devices.
In certain cases you may want to limit the model inference to particular subset of available devices.
For example, you have two devices and you want to run concurrent inference of two models.
In default case both devices would be used for both model inferences causing the models to be reloaded to
devices each time you run the inference of another model. Even if the model loading for DeGirum Orca devices
is extremely fast, it still may cause performance degradation. In this case you may want to run the first model
inference only on the first device, and the second model inference only on the second device.
To do so you need to assign degirum.model.Model.devices_selected property of each model object to contain
the list of device indexes you want your model to run on. In our example you need to assign the list [0]
to the
devices_selected
property of the first model object, and the list [1]
to the second model object.
To get the information about available devices, you query degirum.model.Model.devices_available property.
It returns the list of device indexes of all available devices of the type this model is designed for.
Those indexes are zero-based, so if your host computer has a single device of a given type, the returned list
would contain single zero element: [0]
. In case of two devices it will be [0, 1]
and so on.
Note: since the inference device assignment for cloud inferences is performed dynamically, and actual AI farm node configuration is unknown until the inference starts, the
devices_available
property for such use case always returns the full list of available devices.
In general, the list you assign to the devices_selected
property should contain only indexes occurred in the
list returned by the devices_available
property.
Handling Multiple Streams of Frames
The Model class interface has a method, degirum.model.Model.predict_batch, which can run multiple predictions
on a sequence of frames. In order to deliver the sequence of frames to the predict_batch
you implement
an iterable object, which returns your frames one-by-one. One example of iterable object is a regular Python
list, another example is a function, which yields frame data using yield
statement. Then you pass such iterable
object as an argument to the predict_batch
method. In turn, the predict_batch
method returns a generator object,
which yields prediction results using yield
statement.
All the inference magic with pipelining sequential inferences, asynchronously retrieving inference results,
supporting various inference devices, and AI server vs. local operation modes happens inside the implementation
of predict_batch
method. All you need to do is to wrap your sequence of frame data in an iterable object, pass this
object to predict_batch
, and iterate over the generator object returned by predict_batch
using either
for
-loop or by repeatedly calling next()
built-in function on this generator object.
The following example runs the inference on an infinite sequence of frames captured from the camera:
import cv2 # OpenCV
stream = cv2.VideoCapture(0) # open video stream from local camera #0
def source(): # define iterator function, which returns frames from camera
while True:
ret, frame = stream.read()
yield frame
for result in model.predict_batch(source()): # iterate over inference results
cv2.imshow("AI camera", res.image_overlay) # process result
But what if you need to run multiple concurrent inferences of multiple asynchronous data streams with different frame
rates? The simple approach when you combine two generators in one loop either using zip()
built-in function or by
manually calling next()
built-in function for every generator in a loop body will not work effectively.
Non-working example 1. Using zip()
built-in function:
batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
for result1, result2 in zip(batch1, batch2)
# process result1 and result2
Non-working example 2. Using next()
built-in function:
batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
while True:
result1 = next(batch1)
result2 = next(batch2)
# process result1 and result2
The reason is that the Python runtime has Global Interpreter Lock (GIL), which allows running only one thread at a time blocking the execution of other threads. So if the currently running thread is itself blocked by waiting for the next frame or waiting for the next inference result, all other threads are blocked as well.
For example, if the frame rate of source1()
is slower than the frame rate of source2()
and assuming that the
model inference frame rates are higher than the corresponding source frame rates, then the code above will
spend most of the time waiting for the next frame from source1()
, not letting frames from source2()
to be retrieved,
so the model2
will not get enough frames and will idle, losing performance.
Another example is when the inference latency of model1
is higher than the inference queue depth expressed in time
(this is the product of the inference queue depth expressed in frames and the single frame inference time).
In this case when the model1
inference queue is full, but inference result is not ready yet, the code above will
block on waiting for that inference result inside next(batch1)
preventing any operations with model2
.
To get around such blocks the special non-blocking mode of batch predict operation is implemented. You turn
on this mode by assigning True
to degirum.model.Model.non_blocking_batch_predict property.
When non-blocking mode is enabled, the generator object returned by predict_batch()
method accepts None
from the input iterable object. This allows you to design non-blocking frame data source iterators: when no data
is available, such iterator just yields None
without waiting for the next frame. If None
is returned from the
input iterator, the model predict step is simply skipped for this iteration.
Also in non-blocking mode when no inference results are available in the result queue at some iteration,
the generator yields None
result. This allows to continue execution of the code which operates with another model.
In order to operate in non-blocking mode you need to modify your code the following way:
- Modify frame data source iterator to return
None
if no frame is available yet, instead of waiting for the next frame. - Modify inference loop body to deal with
None
results by simply skipping them.
Measure Inference Timing
The degirum.model.Model class has a facility to measure and collect model inference time information.
To enable inference time collection assign True
to degirum.model.Model.measure_time property.
When inference timing collection is enabled, the durations of individual steps for each frame prediction are
accumulated in internal statistic accumulators. Also, the timing
attribute is added to each result instance
as returned by prediction methods; this attribute contains timing info of that particular result.
To reset time statistic accumulators you use degirum.model.Model.reset_time_stats method.
To retrieve time statistic accumulators you use degirum.model.Model.time_stats method. This method returns a dictionary with time statistic objects. Each time statistic object accumulates time statistics for particular inference step over all frame predictions happened since the timing collection was enabled or reset. The statistics includes minimum, maximum, average, and count. Inference steps correspond to dictionary keys. The following dictionary keys are supported:
Key | Description |
---|---|
FrameTotalDuration_ms |
Frame total inference duration from the moment when you invoke predict method to the moment when inference results are returned |
PythonPreprocessDuration_ms |
Duration of client-side pre-processing step including data loading time and data conversion time |
CorePreprocessDuration_ms |
Duration of server-side pre-processing step |
CoreInferenceDuration_ms |
Duration of server-side AI inference step |
CoreLoadResultDuration_ms |
Duration of server-side data movement step |
CorePostprocessDuration_ms |
Duration of server-side post-processing step |
CoreInputFrameSize_bytes |
The size of received input frame |
For DeGirum Orca AI accelerator hardware additional dictionary keys are supported:
Key | Description |
---|---|
DeviceInferenceDuration_ms |
Duration of AI inference computations on AI accelerator IC excluding data transfers |
DeviceTemperature_C |
Internal temperature of AI accelerator IC in C |
DeviceFrequency_MHz |
Working frequency of AI accelerator IC in MHz |
The individual timing results stored in the timing
attribute of inference result objects contain the same set of dictionary keys.
The time statistics object supports pretty-printing so you can directly print it using regular print()
statement.
For example, the output of the following statement:
... will look like this:
It consists of the inference step name (PythonPreprocessDuration_ms
in this case) followed by four statistic
values presented in a format minimum, average, maximum, count
.
You may print the whole table of statistics using the following code:
The output will look like this:
Statistic , Min, Avg, Max, Cnt
PythonPreprocessDuration_ms , 7.52, 10.09, 17.07, 25
CoreInferenceDuration_ms , 12.81, 13.86, 19.06, 25
CoreLoadResultDuration_ms , 0.20, 0.27, 0.58, 25
CorePostprocessDuration_ms , 0.97, 1.09, 1.74, 25
CorePreprocessDuration_ms , 10.91, 12.08, 18.07, 25
DeviceInferenceDuration_ms , 6.32, 6.36, 6.39, 25
FrameTotalDuration_ms , 40.70, 54.68, 234.25, 25
Note: In batch prediction mode many inference phases are pipelined so the pre- and post-processing steps of one frame may be executed in parallel with the AI inference step of another frame. Therefore actual frame rate may be higher than the frame rate calculated by
FrameTotalDuration_ms
statistic.Note:
PythonPreprocessDuration_ms
statistic includes data loading time and data conversion time. This can give very different results for different ways of loading input frame data. For example, if you provide image URLs for inference, then thePythonPreprocessDuration_ms
will include image downloading time, which can be much higher compared with the case when you provide the image as numpy array, which does not require any downloading.
The following example shows how to use time statistics collection interface.
It assumes that the model
variable is the model created by load_model()
.
model.measure_time = True # enable accumulation of time statistics
# perform batch prediction
for result in model.predict_batch(source()):
# you may access timing of each particular result via `timing` attribute:
print(result, result.timing["CoreInferenceDuration_ms"])
stats = model.time_stats() # query time statistics dictionary
# pretty-print frame total inference duration statistics
print(stats["FrameTotalDuration_ms"])
# print average duration of AI inference step
print(stats["CoreInferenceDuration_ms"].avg)
model.reset_time_stats() # reset time statistics accumulators
# perform one more batch prediction
for result in model.predict_batch(source()):
# process result
pass
# print statistics of Python pre-processing step
print(stats["PythonPreprocessDuration_ms"].max)
Custom Post Processors for Inference Results
When you want to work with some new AI model and PySDK does not yet provide post-processor class to interpret model results, then you may want to implement that post-processing code yourself.
Such code typically takes the AI model output tensor data and interprets that raw tensor data to produce some meaningful results like bounding boxes, probabilities, etc. Then it renders these results on a top of original image to produce so-called image overlay.
PySDK provides a way to seamlessly integrate such custom post-processing code so it will behave exactly like built-in post-processors. To do so, you need to complete the following two steps:
- Implement your own custom post-processor class.
- Instruct AI model object to use your custom post-processing class instead of built-in post-processor.
To better understand, how post-processing is organized in PySDK, we need to disclose some implementation details. First of all, the built-in post-processing is actually split into two parts, two distinct pieces of code, which are executed in different places:
- Low-level post-processor, which performs raw tensor data conversion into a JSON array. The format of this array is model-specific, and all supported formats are described here. This post-processor is invoked on the AI server side.
- PySDK-level post-processor, which performs AI result rendering on a top of the original image, generating image overlay. This post-processor is invoked on the client side.
Your custom post-processor class should actually implement both parts: raw tensor conversion and image overlay generation. It must inherit degirum.postprocessor.InferenceResults base class or any class derived from degirum.postprocessor.InferenceResults. If you want to reuse some image overlay generation functionality of built-in post-processor classes, you may inherit one of them.
The following is the list of PySDK-level post-processor classes, which you may inherit in order to reuse AI result rendering code, or, another words, the implementation of image_overlay method.
Post-Processor Class | Description |
---|---|
degirum.postprocessor.ClassificationResults | Post-processor, which renders classification results |
degirum.postprocessor.DetectionResults | Post-processor, which renders object detection results, including pose detection and face detection |
degirum.postprocessor.Hand_DetectionResults | Post-processor, which renders hand detection results |
degirum.postprocessor.SegmentationResults | Post-processor, which renders segmentation results |
Unfortunately, the code, which transforms raw tensor data (the low-level post-processor), you need to develop yourself, since the format of the raw tensors for new models is usually unique.
Your custom post-processor class may override the following methods of degirum.postprocessor.InferenceResults base class:
Method to Override | Description |
---|---|
init() | Constructor. |
str() | Conversion of inference results to string. |
image_overlay | Rendering image overlay. |
Typically, you do not need to override any other methods/properties of the base class: their default implementations are generic enough.
The code of transforming raw tensor data into human-friendly results you need to implement in the constructor. The typical implementation of such constructor is the following:
import degirum as sg
class MyResultProcessor(dg.postprocessor.InferenceResults):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs) # call base class constructor first
# at this point self._inference_results contains the list or raw output tensors
new_results = [] # you define empty list for new human-friendly results
# you iterate over tensors
for tensor in self._inference_results:
# and convert them into a list of dictionaries, where each dictionary represents one detected entity
detected_entity = {}
new_results.append(detected_entity) # append detected entity to the list of human-friendly results
# finally you replace self._inference_results with new list of human-friendly results
self._inference_results = new_results
Basically, you need to process all raw tensors contained in the self._inference_results
list and convert them
into a list of human-friendly results. Then you substitute self._inference_results
with this new list of human-friendly results.
The element of such list should be a dictionary. The format of this dictionary you may chose as you see fit.
But if you chose one of existing formats, then you may reuse one of existing
image overlay generation implementations. In this case you inherit your custom post-processor class from the PySDK post-processor
class, which result format you decided to reuse, and do not override image_overlay
property.
Each raw tensor in the self._inference_results
list is represented by a dictionary.
The format of the raw tensor dictionary is the following:
Key Name | Description | Data Type |
---|---|---|
"id" |
Tensor numeric ID as specified in the model | integer |
"name" |
Tensor name as specified in the model | string |
"shape" |
Tensor shape: sizes of each dimension | integer list |
"quantization" |
Tensor quantization parameters | dictionary, see below |
"type" |
Tensor element type | string, see below |
"data" |
Tensor data buffer contents | multi-dimensional numpy array |
The following tensor data types are supported:
Type String | Type Description |
---|---|
"DG_FLT" |
32-bit floating point |
"DG_UINT8" |
8-bit unsigned integer |
The following is the structure of quantization
dictionary:
Json Field Name | Description | Data Type |
---|---|---|
"axis" |
Quantization axis or -1 for global quantization | integer |
"scale" |
Quantization scale array | floating point list |
"zero" |
Quantization zero offset | integer list |
Another typical task you need to perform in your custom post-processor class constructor is to convert coordinates of
detected entities from AI model input image coordinates to original image coordinates. This can be done by invoking a
function stored in the self._conversion
property. You pass a tuple of (x,y) coordinates in respect to the AI model
input image, and it returns a tuple of (x,y) coordinates in respect to the original image.
If you decided to define completely new format of human-friendly results, then you will need to implement image_overlay
property.
The typical implementation of image_overlay
property is the following:
import degirum as sg
class MyResultProcessor(dg.postprocessor.InferenceResults):
@property
def image_overlay(self):
# create drawing object to avoid using graphical backends directly
draw = create_draw_primitives(self._input_image, self._alpha, self._font_scale)
# create a set of colors to be used for drawing different classes of objects
current_color_set = itertools.cycle(
self._overlay_color
if isinstance(self._overlay_color, list)
else [self._overlay_color]
)
# iterate over all inference results created in your constructor
for res in self._inference_results:
# and draw AI annotations over the original image, which is stored inside `draw` object
# use draw.draw_text() to print a text string
# use draw.draw_circle() to draw a circle
# use draw.draw_line() to draw a line
# use draw.draw_box() to draw a rectangle
# use next(current_color_set) to obtain the color for the next object class
# check `self._show_labels` to draw or not to draw text labels
# check `self._show_probabilities` to draw or not to draw probabilities
# return image overlay
return draw.image_overlay()
Basically, you iterate over all detected entities and draw them on the original image. To harmonize the behavior of your code with PySDK standards you may follow the following practices:
- Use
create_draw_primitives()
method to create a drawing object, which will handle all drawing tasks using proper graphical backend, as selected for the model. - Check
self._show_labels
property to draw or not to draw text labels. - Check
self._show_probabilities
property to draw or not to draw probabilities. - Use
draw
object methods likedraw.draw_text()
,draw.draw_circle()
,draw.draw_line()
,draw.draw_box()
to draw various geometric shapes. - Use
self._overlay_color
property to obtain colors for your classes of objects. The example above demonstrates, how to do it. - Return
draw.image_overlay()
at the end.
When your new custom post-processor class is ready, you need to instruct AI model object to use your custom post-processing class instead of built-in post-processor. Do it in two steps:
- Assign
"None"
todegirum.model.Model._model_parameters.OutputPostprocessType
property to disable any low-level server-side post-processing. - Assign your custom class to degirum.model.Model.custom_postprocessor property to attach your custom post-processor.
model = zoo.load_model(model_name) # load model
model._model_parameters.OutputPostprocessType = "None"
model.custom_postprocessor = MyResultProcessor
You need to do these steps before the very first inference.
From now on each model inference result returned by model prediction methods called from that model object
will be of MyResultProcessor
type.