PySDK Release Notes
Version 0.14.1 (11/04/2024)
New Features and Modifications
-
ORCA1 firmware version 1.1.21 is included in this release. It contains numerous bug fixes targeted to improve reliability or ORCA1 USB operations.
-
Object blurring option is implemented in PySDK object detection results renderer. To control blurring you use
degirum.model.Model.overlay_blur
property of theModel
class.- Assign
None
to disable blurring (this is the default value):model.overlay_blur = None
. - To enable blurring of all detected bounding boxes, assign
"all"
string :model.overlay_blur = "all"
. - To enable blurring of bounding boxes belonging to a particular class, assign that class label string: :
model.overlay_blur = "car"
. - To enable blurring of bounding boxes belonging to particular class list, assign a list of class label strings:
model.overlay_blur = ["car", "person"]
.
- Assign
-
YOLOv8 postprocessor now supports both normalized and regular bounding boxes. It automatically infers that the boxes are normalized or not, and if they are normalized to unity, the boxes are adjusted to the images size. Note that the box outputs are typically normalized for TFLite models, while ONNX models usually do not provide normalization.
Bug Fixes
- The error message similar to this
"Shape of tensor passed as the input #0 does not match to model parameters. Expected tensor shape is (1, 0, 0, 77)"
appears when performing AI server inference of a model with Tensor input type of fewer than 4 dimensions, when those dimensions are specified usingInputN/InputW/InputH/InputC
model parameters, for example,InputN: 1, InputC: 77
. The error does not appear when dimensions are specified usingInputShape
model parameter.
Version 0.14.0 (10/13/2024)
New Features and Modifications
-
ORCA1 firmware version 1.1.19 is included in this release. It contains numerous bug fixes targeted to improve reliability or ORCA1 operation.
-
Robust and secure Python postprocessor execution framework is implemented for AI server. Now all Python postprocessor code is executed in separate process pool in sandboxed environments as opposed to in-process execution in previous PySDK versions.
-
Device validation is implemented when you try to load a model from a cloud model zoo and the inference device requested by that model is not available. In such case the following exception is raised:
"Model '{model}' does not have any supported runtime/device combinations that will work on this system."
-
timing
attribute is added to the inference result base classdegirum.postprocessor.InferenceResults
. This attribute is populated with the inference timing information whendegirum.model.Model.measure_time
property is set to True. The inference timing information is represented as a dictionary with the same keys as returned bydegirum.model.Model.time_stats()
method.
Bug Fixes
-
degirum.model.Model.output_class_set
class label filtering is not applied when anydegirum_tools
analyzers are attached to the model object bydegirum_tools.attach_analyzers()
. -
Significant (100x) performance drop of
TFLITE/CPU
model inference when more than one virtual CPU device is selected for the inference (which is default condition).
Version 0.13.4 (9/21/2024)
New Features and Modifications
-
AMD Vitis NPU is initially supported for Windows OS. The runtime/device designator for this device is
"ONNX/VITIS_NPU"
. -
Variable number of landmarks is supported in pose detection postprocessor. This is needed to support new face keypoints recognition models.
-
AI server ASIO protocol is improved to disconnect client in case of aborted inference without waiting for inference timeout.
Version 0.13.3 (9/12/2024)
New Features and Modifications
-
ORCA1 firmware version 1.1.18 is included in this release. This firmware improves the mechanism of detection of DDR4 external memory link failures.
-
The error handling of critical ORCA hardware errors is improved: when such error is diagnosed during the inference, ORCA firmware is reloaded, ORCA is reinitialized, and the inference is retried once. If such retry succeeds, the error is not reported.
-
The performance of HWC -> CHW conversion in AI server pre-processor is improved. This affects inference speed of ONNX models with NCHW input tensor layouts.
-
The post-processor for YOLOv10 object detection models is implemented. The post-processor tag is
"DetectionYoloV10"
. -
cache-dump
subcommand is added toserver
command of PySDK CLI. This subcommand queries the current state od AI server runtime agent cache. Usage example:degirum server cache-dump --host <hostname>
-
AI server tracing to stdout is implemented. To enable tracing, put
__TraceToStdout=yes
trace configuration option intodg_trace.ini
trace configuration file. Traces will be printed to stdout in JSON format, compatible with log collection services such as DataDog, Loki/Grafana, and Elastic/Kibana. To enable tracing for all AI server events, additionally putAIServer=Detailed
trace configuration option intodg_trace.ini
trace configuration file.> *Note*: `dg_trace.ini` trace configuration file is located in `~/.local/share/DeGirum/trace` directory for Linux systems, and in `%APPDATA%\DeGirum\traces` folder for Windows systems. If it is not there, you just create it.
Bug Fixes
- When cloud server responds with cloud inference error details, the detailed message is not included into the text of the raised exception.
Version 0.13.2 (7/26/2024)
Bug Fixes
- N2X runtime agent fails to load on Linux systems when
/dev/bus/usb
device is not available. This leads to inability to useN2X/ORCA1
andN2X/CPU
inference devices on such systems. This problem affects PySDK installations running on virtual machines and inside Docker images started in non-privileged mode.
Version 0.13.1 (7/17/2024)
New Features and Modifications
-
Added support of OpenVINO version 2024.2.0.
-
YOLO segmentation model postprocessing support is implemented in
degirum.postprocessor.DetectionResults
class. -
degirum version
command is added to PySDK CLI. Using this command you may obtain PySDK version. -
degirum.zoo_manager.ZooManager.system_info()
method added. This method queries the system info dictionary of the attached inference engine. The format of this dictionary is the same as the output ofdegirum sys-info
command. -
Now to access the DeGirum public cloud model zoo there is no need to use cloud API token. So, the following code will just work:
```python import degirum as dg zoo = dg.connect(dg.CLOUD) zoo.list_models() ```
-
ORCA1 firmware version 1.1.15 is included in this release. This firmware implements measures to reinitialize DDR4 external memory link in case of failures. This reduces the probability of runtime errors such as
"Timeout waiting for RPC EXEC completion"
. -
ORCA1 firmware is now loaded on AI server startup only in case of version mismatch or previously detected critical hardware error. In previous AI server versions it was reloaded unconditionally on every start.
-
degirum.model.Model.device_type
property now can be assigned for single-device models (models for whichSupportedDeviceTypes
model parameter is not defined). In previous PySDK versions such assignment always generated an error"Model does not support dynamic device type selection: model property SupportedDeviceTypes is not defined"
.
Bug Fixes
- Google EdgeTPU AI accelerator support was broken in PySDK ver. 0.13.0. Now it is restored.
Version 0.13.0 (6/21/2024)
New Features and Modifications
-
Plugin for RKNN runtime is initially supported. This plugin allows performing inferences of
.rknn
AI models on RockChip AI accelerators, including:- RK3588 - RK3568 - RK3568
-
TFLite plugin now supports the following inference delegates:
- NXP VX
- NXP Ethos-U
- ArmNN
-
The
device_type
keyword argument is added todegirum.zoo_manager.ZooManager.list_models
method. It specifies the filter for target runtime/device combinations: the string or list of strings of full device type names in "RUNTIME/DEVICE" format. For example, the following code will return the list of models for N2X/ORCA1 runtime/device pair:```python model_list = zoo.list_models(device_type = "N2X/ORCA1") ```
-
New functions have been added to PySDK top-level API:
degirum.list_models()
degirum.load_model()
degirum.get_supported_devices()
These functions are intended to further simplify PySDK API.
The function
degirum.list_models()
allows you to request the list of models without explicitly obtainingZooManager
object viadegirum.connect()
call. It combines the arguments ofdegirum.connect()
anddegirum.zoo_manager.ZooManager.list_models()
which appear one after another, for example:list = degirum.list_models( degirum.CLOUD, "https://hub.degirum.com", "<token>", device_type="N2X/ORCA1" )
The function
degirum.load_model()
allows you to load the model without explicitly obtainingZooManager
object viadegirum.connect()
call. It combines the arguments ofdegirum.connect()
anddegirum.zoo_manager.ZooManager.load_model()
, model name goes first. For example:model = degirum.load_model( "mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1", degirum.CLOUD, "https://hub.degirum.com", "<token>", output_confidence_threshold=0.5, )
The function
degirum.get_supported_devices()
allows you to obtain the list of runtime/device combinations supported by the inference engine of your choice. It accepts the inference engine designator as a first argument. It returns the list of supported device type strings in a form "/ ". For example, the following call requests the list of runtime/device combinations supported by the AI server on localhost
: -
The post-processor for YOLOv8 pose detection models is implemented. The post-processor tag is
"PoseDetectionYoloV8"
. -
Pre-processor letter-boxing implementation is changed to match Ultralytics implementation for better mAP match.
-
ORCA firmware loading time is reduced by 3 seconds.
Bug Fixes
-
"Timeout 10000 ms waiting for response from AI server"
error may happen intermittently at the inference start of a cloud model on AI server, when AI server has unreliable connection to the Internet due to incorrect timeouts on the client side. -
Model filtering functionality of
degirum.zoo_manager.ZooManager.list_models
method works incorrectly with multi-device models having device wildcards inSupportedDeviceTypes
. For example, if the model hasSupportedDeviceTypes: "OPENVINO/*"
, then the callzoo.list_models(device="ORCA1")
returns such model despite "ORCA1" device is not supported by "OPENVINO" runtime.
Version 0.12.3 (6/3/2024)
Bug Fixes
-
AI server protocol backward compatibility was broken in PySDK version 0.12.2, which prevented older clients to communicate with newer AI server with cryptic error messages like
"RuntimeError: [json.exception.type_error.302] type must be number, but is binary"
. -
Model input parameters for inputs other than zero do not propagate to AI server. For dynamic-input models this may cause errors like
"Incorrect input tensor size: the model configuration file defines input tensor to be <X> elements, while the size of supplied tensor is <Y> elements"
. -
An attempt to set
degirum.model.Model.input_shape
property for"Image"
input type fails: it assignsInputShape
model parameter instead ofInputN/W/H/C
model parameters. -
Cloud inference for multi-input models was not supported: it leads to inference timeout error messages like
"Timeout <X> ms waiting for response from AI server"
. -
N2X JIT compilation fails with segmentation fault when corrupted ONNX file is supplied for compilation. Now it produces the error message
"Unknown model format"
. -
License files for third-party libraries distributed in PySDK can potentially create very long file paths which may lead to PySDK installation failure on Windows OS with errors like
"Could not install packages due to OSError: [WinError 206] The filename or extension is too long"
.
Version 0.12.2 (5/17/2024)
New Features and Modifications
-
New model parameter,
InputShape
, is supported for AI models with tensor input type (InputType == "Tensor"
). This parameter specifies the input tensor shape. It may have arbitrary number of elements, which allows specifying tensor shapes with any number of dimensions. It supersedesInputN
,InputH
,InputW
, andInputC
parameters, which are also used for the same purpose: ifInputShape
parameter is specified for a model input, its value will be used, andInputN
,InputH
,InputW
, andInputC
parameters will be ignored.The `InputShape` parameter value is a list of input shapes, one shape per each model input. Each element of that list (which defines a shape for particular input) is another list containing input dimensions, **slowest** dimension first. For example, NHWC tensor shape is represented as `[N, H, W, C]` list, where zero index contains `N` value. The `InputShape` parameter is runtime parameter, meaning that its value can be changed on the fly.
-
Model parameters
InputN
,InputH
,InputW
, andInputC
are converted to runtime parameters, so they can be changed on the fly. This allows more effective use of AI models with so-called dynamic inputs, which are supported by OpenVINO runtime (more details by this link).In order to adjust the size of the input data, accepted by PySDK preprocessor, you need to assign the actual input data size/shape to be used for consecutive inferences **before** performing the inference. If your model has image input type (`InputType == "Image"`), then you assign `InputN`, `InputH`, `InputW`, and `InputC` model parameters to match the size of images to be used for the inference. The PySDK preprocessor will resize the input images to assigned size. If input images already have that size, resizing step will be skipped. In any case, the inference runtime will receive the image of that size. If your model has tensor input type (`InputType == "Tensor"`), then you assign `InputShape`model parameter to match the shape of tensors to be used for the inference. Since PySDK does not do any resizing for tensor inputs, all tensors you pass for inferences must have the specified shape, so the inference runtime will receive the tensors of that shape. > Not all inference runtimes support dynamic inputs. At the time of this release, only OpenVINO runtime supports them. > Currently, PySDK does not support batch size other than 1 for image input types, so the `InputN` model parameter should not be changed.
-
New property
degirum.model.Model.input_shape
is added to the Model class. This property allows unified access to model input size/shape parameters:InputN
,InputH
,InputW
,InputC
, andInputShape
regardless of the input type (image or tensor).The getter returns and the setter accepts the list of input shapes, one shape per each model input. Each element of that list (which defines a shape for particular input) is another list containing input dimensions, slowest dimension first. For each input, the getter returns `InputShape` value if `InputShape` model parameter is specified for the input, otherwise it returns `[InputN, InputH, InputW, InputC]`. The setter works symmetrically: it assigns the provided list to `InputShape` parameter, if it was specified for the model input, otherwise it assigns provided list to `InputN`, `InputH`, `InputW`, and `InputC` parameters in that order (i.e. zero index to `InputN` and so forth). This property can be used in conjunction with dynamic inputs feature to simplify setting of input shapes.
-
If
DG_CPU_LIMIT_CORES
environment variable is defined, its value is used by AI server to limit the number of virtual CPU inference devices, such as N2X/CPU or OPENVINO/CPU. When it is not defined, one half of the total physical CPU cores is used, as in previous versions. This feature is useful, when AI server is running in Docker container and you want to limit the number of virtual CPU inference devices to reduce the CPU load.
Bug Fixes
-
OpenVINO CPU model inferences fail intermittently when running many models on the same node with the following error message: "CompiledModel was not initialized."
-
Model filtering functionality of
degirum.zoo_manager.ZooManager.list_models
method was broken: -
it does not filter-out models, which are not supported by the inference engine, attached to zoo manager object,
-
it does not filter-out models, which has empty
SupportedDeviceTypes
model parameter. -
Model fallback parameters support is broken for AI server inference mode.
Version 0.12.1 (4/25/2024)
New Features and Modifications
-
New property
degirum.model.Model.supported_device_types
is added to the Model class. This read-only property returns the list of runtime/device types supported simultaneously by the model and by connected inference engine. Each runtime/device type in the list is represented by a string in a format"RUNTIME/DEVICE"
.For example, the list `["OPENVINO/CPU", ONNX/CPU"]` means that the model can be run on both Intel OpenVINO and Microsoft ONNX runtimes using CPU as a hardware device.
-
The
degirum.model.Model.device_type
property now accepts a list of desired"RUNTIME/DEVICE"
pairs. The first supported pair from that list will be set. This simplifies inference device assignment for multi-device models on a variety of systems with different sets of inference devices.For example, you have a model, which supports all devices of OpenVINO runtime (NPU, GPU, and CPU) and you want to run this model on NPU, when it is available, otherwise on GPU, when it is available, and fallback to CPU if neither NPU, nor GPU is available. In this case you may do the following assignment: ``` model.device_type = ["OPENVINO/NPU", "OPENVINO/GPU", "OPENVINO/CPU"] ``` Reading `device_type` property back after list assignment will give you the actual device type assigned for the inference.
Bug Fixes
-
Variable tensor shape support is fixed in PySDK for
"Tensor"
input types for multi-input models, when the input tensor with shape other than 4-D has index other than zero. -
Very intermittently, models are not fully downloaded from a cloud model zoo for AI server-based and local inference types, and there is no error diagnostics for that. As the result, corrupted models are used for the inference, which leads to unclear/not related error messages. Correction measures include analyzing "Content-Length" HTTP header when downloading a model archive from a cloud model zoo with retries if the actual downloaded file size is less than expected. Also, zip archive CRC is checked for each file when unpacking model assets.
-
In case of inference errors, AI server ASIO protocol closes the client socket too soon, which causes error message packet loss on the client side, which, in turn, leads to incorrect error report: instead of actual error, the generic socket errors like "Broken pipe" or "Operation aborted" are reported.
-
When AI server scans local model zoo and finds a multi-device model, which default runtime/device combination (as specified in
RuntimeAgent
andDeviceType
model parameters) is not supported by the system, it discards such model, despite this model supports other runtime/device combinations available on this system. It happens becauseSupportedDeviceTypes
model parameter is not analyzed when scanning local zoos.
Version 0.12.0 (4/8/2024)
New Features and Modifications
-
Multi-device/multi-runtime models are supported in PySDK and in the cloud zoo.
Such models have additional model parameter
SupportedDeviceTypes
, which defines a comma-separated list of runtime/device combinations supported by the model. Each element of this list is"RUNTIME/DEVICE"
pair.The
RUNTIME
part specifies the runtime, while theDEVICE
part specifies the device type. The following runtime/device combinations are supported as of PySDK version 0.12.0:Runtime Devices N2X CPU, ORCA1 OPENVINO CPU, GPU, NPU, MYRIAD ONNX CPU TFLITE CPU, EDGETPU TENSORRT GPU, DLA, DLA_FALLBACK New runtimes and devices can be supported in the future versions of PySDK.
You may specify
"*"
as the wildcard in any part of the RUNTIME/DEVICE pair: it will match any supported runtime or device type. For example,"N2X/*"
defines the model, which supports all devices of N2X runtime (that would be N2X/CPU and N2X/ORCA1), and"*/GPU"
defines the model, which supports all GPU devices of all runtimes (that would be OPENVINO/GPU and TENSORRT/GPU).For multi-device models you may select on the fly, which runtime/device combination to use for the model inference, assuming the desired runtime/device combination is supported by the model. You assign runtime/device combination to
degirum.model.Model.device_type
property as the string in the format"RUNTIME/DEVICE"
exactly as it is defined in theSupportedDeviceTypes
list.You can reassign
device_type
property multiple times for the same model object. For example: -
Just-in-time (JIT) compilation is introduced for DeGirum N2X models for ORCA devices. Now you may create ORCA models specifying either ONNX or TFLITE binary model file in
ModelPath
model parameter: you do not need to pre-compile your model into.n2x
file format. This significantly simplifies model development for DeGirum ORCA devices. When N2X runtime discovers.onnx
or.tflite
binary model file extension, it automatically invokes N2X compiler and compiles the model into.n2x
format, saving the compiled model in the local cache for future use. Cached models are identified in the cache byChecksum
model parameter: two models with the same name but with different checksums are cached into two different files.New model parameter `CompilerOptions` is introduced to pass options to JIT compiler. The parameter type is JSON dictionary, where the key is the runtime/device pair, and the value is the compiler options applicable for this runtime/device pair. For example: `{ "N2X/ORCA1": "--no-software-layers" }` will pass `--no-software-layers` compiler option string when compiling models for ORCA1 device and N2X runtime.
-
degirum.connect()
now supports new mode of local inference when models are served from the local model zoo directory instead of serving just single model file. To use this mode, you calldegirum.connect()
passingdg.LOCAL
as the first argument, and the path to the local model zoo directory as the second argument:``` zoo = dg.connect(dg.LOCAL, "/path/to/local/zoo/dir")` ``` You may download models to the local model zoo directory similar way as for AI server using `degirum download-zoo` command.
-
New
"auto"
value is introduced forInputTensorLayout
model parameter: when it is set to"auto"
then input tensor layout will be selected as"NCHW"
for"OPENVINO"
,"ONNX"
, and"TENSORRT"
runtimes, and"NHWC"
otherwise. This feature facilitate creation of multi-runtime models, when input tensor layout should be set to"NCHW"
for some runtimes, and"NHWC"
for some other runtimes. -
New
"auto"
value is introduced fordegirum.model.Model.overlay_alpha
property: when it is set to"auto"
PySDK will useoverlay_alpha
= 0.5 for segmentation models andoverlay_alpha
= 1.0 otherwise. This is now the default value foroverlay_alpha
property. -
The AI server will try to serve the cloud model from the local cache even if the model checksum request is failed due to poor or absent Internet connection. This allows to continue using AI server in case of poor or absent Internet connection, assuming all required cloud models are already downloaded to the local cache.
-
The model download timeout from the cloud zoo is increased from 10 to 40 seconds.
-
When running inside Docker container, the number of CPU devices reported by runtimes which support CPU inferences (such as OpenVINO) now takes into account Docker-imposed CPU quotas. For example, if the AI server Docker container is started with
--cpus=4
, then the number of virtual CPU devices reported by runtime will be half of that amount, i.e. 2 CPUs. -
ORCA firmware is now forcefully reset on each AI Server start to ensure clean recovery from previous failures.
Bug Fixes
-
Model filtering in
degirum.zoo_manager.ZooManager.list_models
method does not accept"NPU"
device type. -
Support of dynamically-sized output tensors does not work for OpenVINO runtime.
-
OpenVINO runtime reports single CPU device in system info, while actual number of virtual devices is more than one.
-
TensorRT runtime fails with error when quantized model does not specify
CalibrationFilePath
model parameter. -
Variable tensor shape support is fixed in PySDK for
"Tensor"
input types. In previous versions, for input tensors having other than four dimensions, the following error is raised:"Shape of tensor passed as the input #<n> does not match to model parameters. Expected tensor shape is (<x>, <y>, <z>, <t>)."