PySDK Release Notes

Version 0.14.0 (10/13/2024)

New Features and Modifications

ORCA1 firmware version 1.1.19 is included in this release. It contains numerous bug fixes targeted to improve reliability or ORCA1 operation.
Robust and secure Python postprocessor execution framework is implemented for AI server. Now all Python postprocessor code is executed in separate process pool in sandboxed environments as opposed to in-process execution in previous PySDK versions.
Device validation is implemented when you try to load a model from a cloud model zoo and the inference device requested by that model is not available. In such case the following exception is raised: "Model '{model}' does not have any supported runtime/device combinations that will work on this system."
timing attribute is added to the inference result base class degirum.postprocessor.InferenceResults. This attribute is populated with the inference timing information when degirum.model.Model.measure_time property is set to True. The inference timing information is represented as a dictionary with the same keys as returned by degirum.model.Model.time_stats() method.

Bug Fixes

degirum.model.Model.output_class_set class label filtering is not applied when any degirum_tools analyzers are attached to the model object by degirum_tools.attach_analyzers().
Significant (100x) performance drop of TFLITE/CPU model inference when more than one virtual CPU device is selected for the inference (which is default condition).

Version 0.13.4 (9/21/2024)

New Features and Modifications

AMD Vitis NPU is initially supported for Windows OS. The runtime/device designator for this device is "ONNX/VITIS_NPU".
Variable number of landmarks is supported in pose detection postprocessor. This is needed to support new face keypoints recognition models.
AI server ASIO protocol is improved to disconnect client in case of aborted inference without waiting for inference timeout.

Version 0.13.3 (9/12/2024)

New Features and Modifications

ORCA1 firmware version 1.1.18 is included in this release. This firmware improves the mechanism of detection of DDR4 external memory link failures.
The error handling of critical ORCA hardware errors is improved: when such error is diagnosed during the inference, ORCA firmware is reloaded, ORCA is reinitialized, and the inference is retried once. If such retry succeeds, the error is not reported.
The performance of HWC -> CHW conversion in AI server pre-processor is improved. This affects inference speed of ONNX models with NCHW input tensor layouts.
The post-processor for YOLOv10 object detection models is implemented. The post-processor tag is "DetectionYoloV10".
cache-dump subcommand is added to server command of PySDK CLI. This subcommand queries the current state od AI server runtime agent cache. Usage example: degirum server cache-dump --host <hostname>
AI server tracing to stdout is implemented. To enable tracing, put __TraceToStdout=yes trace configuration option into dg_trace.ini trace configuration file. Traces will be printed to stdout in JSON format, compatible with log collection services such as DataDog, Loki/Grafana, and Elastic/Kibana. To enable tracing for all AI server events, additionally put AIServer=Detailed trace configuration option into dg_trace.ini trace configuration file.
```
> *Note*: `dg_trace.ini` trace configuration file is located in `~/.local/share/DeGirum/trace` directory
for Linux systems, and in `%APPDATA%\DeGirum\traces` folder for Windows systems. If it is not there, you just
create it.
```

Bug Fixes

When cloud server responds with cloud inference error details, the detailed message is not included into the text of the raised exception.

Version 0.13.2 (7/26/2024)

Bug Fixes

N2X runtime agent fails to load on Linux systems when /dev/bus/usb device is not available. This leads to inability to use N2X/ORCA1 and N2X/CPU inference devices on such systems. This problem affects PySDK installations running on virtual machines and inside Docker images started in non-privileged mode.

Version 0.13.1 (7/17/2024)

New Features and Modifications

Added support of OpenVINO version 2024.2.0.
YOLO segmentation model postprocessing support is implemented in degirum.postprocessor.DetectionResults class.
degirum version command is added to PySDK CLI. Using this command you may obtain PySDK version.
degirum.zoo_manager.ZooManager.system_info() method added. This method queries the system info dictionary of the attached inference engine. The format of this dictionary is the same as the output of degirum sys-info command.
Now to access the DeGirum public cloud model zoo there is no need to use cloud API token. So, the following code will just work:
```
```python
import degirum as dg
zoo = dg.connect(dg.CLOUD)
zoo.list_models()
```
```
ORCA1 firmware version 1.1.15 is included in this release. This firmware implements measures to reinitialize DDR4 external memory link in case of failures. This reduces the probability of runtime errors such as "Timeout waiting for RPC EXEC completion".
ORCA1 firmware is now loaded on AI server startup only in case of version mismatch or previously detected critical hardware error. In previous AI server versions it was reloaded unconditionally on every start.
degirum.model.Model.device_type property now can be assigned for single-device models (models for which SupportedDeviceTypes model parameter is not defined). In previous PySDK versions such assignment always generated an error "Model does not support dynamic device type selection: model property SupportedDeviceTypes is not defined".

Bug Fixes

Google EdgeTPU AI accelerator support was broken in PySDK ver. 0.13.0. Now it is restored.

Version 0.13.0 (6/21/2024)

New Features and Modifications

Plugin for RKNN runtime is initially supported. This plugin allows performing inferences of .rknn AI models on RockChip AI accelerators, including:
```
- RK3588
- RK3568
- RK3568
```
TFLite plugin now supports the following inference delegates:
- NXP VX
- NXP Ethos-U
- ArmNN
The device_type keyword argument is added to degirum.zoo_manager.ZooManager.list_models method. It specifies the filter for target runtime/device combinations: the string or list of strings of full device type names in "RUNTIME/DEVICE" format. For example, the following code will return the list of models for N2X/ORCA1 runtime/device pair:
```
```python
model_list = zoo.list_models(device_type = "N2X/ORCA1")
```
```
New functions have been added to PySDK top-level API:
- degirum.list_models()
- degirum.load_model()
- degirum.get_supported_devices()
These functions are intended to further simplify PySDK API.

The function degirum.list_models() allows you to request the list of models without explicitly obtaining ZooManager object via degirum.connect() call. It combines the arguments of degirum.connect() and degirum.zoo_manager.ZooManager.list_models() which appear one after another, for example:
```
list = degirum.list_models(
    degirum.CLOUD,
    "https://hub.degirum.com",
    "<token>",
    device_type="N2X/ORCA1"
)
```
The function degirum.load_model() allows you to load the model without explicitly obtaining ZooManager object via degirum.connect() call. It combines the arguments of degirum.connect() and degirum.zoo_manager.ZooManager.load_model(), model name goes first. For example:
```
model = degirum.load_model(
    "mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1",
    degirum.CLOUD,
    "https://hub.degirum.com",
    "<token>",
    output_confidence_threshold=0.5,
)
```
The function degirum.get_supported_devices() allows you to obtain the list of runtime/device combinations supported by the inference engine of your choice. It accepts the inference engine designator as a first argument. It returns the list of supported device type strings in a form "/". For example, the following call requests the list of runtime/device combinations supported by the AI server on localhost:
```
supported_device_types = degirum.get_supported_devices("localhost")
```
The post-processor for YOLOv8 pose detection models is implemented. The post-processor tag is "PoseDetectionYoloV8".
Pre-processor letter-boxing implementation is changed to match Ultralytics implementation for better mAP match.
ORCA firmware loading time is reduced by 3 seconds.

Bug Fixes

"Timeout 10000 ms waiting for response from AI server" error may happen intermittently at the inference start of a cloud model on AI server, when AI server has unreliable connection to the Internet due to incorrect timeouts on the client side.
Model filtering functionality of degirum.zoo_manager.ZooManager.list_models method works incorrectly with multi-device models having device wildcards in SupportedDeviceTypes. For example, if the model has SupportedDeviceTypes: "OPENVINO/*", then the call zoo.list_models(device="ORCA1") returns such model despite "ORCA1" device is not supported by "OPENVINO" runtime.

Version 0.12.3 (6/3/2024)

Bug Fixes

AI server protocol backward compatibility was broken in PySDK version 0.12.2, which prevented older clients to communicate with newer AI server with cryptic error messages like "RuntimeError: [json.exception.type_error.302] type must be number, but is binary".
Model input parameters for inputs other than zero do not propagate to AI server. For dynamic-input models this may cause errors like "Incorrect input tensor size: the model configuration file defines input tensor to be <X> elements, while the size of supplied tensor is <Y> elements".
An attempt to set degirum.model.Model.input_shape property for "Image" input type fails: it assigns InputShape model parameter instead of InputN/W/H/C model parameters.
Cloud inference for multi-input models was not supported: it leads to inference timeout error messages like "Timeout <X> ms waiting for response from AI server".
N2X JIT compilation fails with segmentation fault when corrupted ONNX file is supplied for compilation. Now it produces the error message "Unknown model format".
License files for third-party libraries distributed in PySDK can potentially create very long file paths which may lead to PySDK installation failure on Windows OS with errors like "Could not install packages due to OSError: [WinError 206] The filename or extension is too long".

Version 0.12.2 (5/17/2024)

New Features and Modifications

New model parameter, InputShape, is supported for AI models with tensor input type (InputType == "Tensor"). This parameter specifies the input tensor shape. It may have arbitrary number of elements, which allows specifying tensor shapes with any number of dimensions. It supersedes InputN, InputH, InputW, and InputC parameters, which are also used for the same purpose: if InputShape parameter is specified for a model input, its value will be used, and InputN, InputH, InputW, and InputC parameters will be ignored.
```
The `InputShape` parameter value is a list of input shapes, one shape per each model input.
Each element of that list (which defines a shape for particular input) is another list containing
input dimensions, **slowest** dimension first. For example, NHWC tensor shape is represented as `[N, H, W, C]` list, where
zero index contains `N` value.

The `InputShape` parameter is runtime parameter, meaning that its value can be changed on the fly.
```

Model parameters InputN, InputH, InputW, and InputC are converted to runtime parameters, so they can be changed on the fly. This allows more effective use of AI models with so-called dynamic inputs, which are supported by OpenVINO runtime (more details by this link).

In order to adjust the size of the input data, accepted by PySDK preprocessor, you need to assign the actual input data size/shape
to be used for consecutive inferences **before** performing the inference.

If your model has image input type (`InputType == "Image"`), then you assign `InputN`, `InputH`, `InputW`, and `InputC` model parameters
to match the size of images to be used for the inference. The PySDK preprocessor will resize the input images to assigned size.
If input images already have that size, resizing step will be skipped. In any case, the inference runtime will receive the image
of that size.

If your model has tensor input type (`InputType == "Tensor"`), then you assign `InputShape`model parameter
to match the shape of tensors to be used for the inference. Since PySDK does not do any resizing for tensor inputs,
all tensors you pass for inferences must have the specified shape, so the inference runtime will receive the tensors
of that shape.

> Not all inference runtimes support dynamic inputs. At the time of this release, only OpenVINO runtime supports them.

> Currently, PySDK does not support batch size other than 1 for image input types, so the `InputN` model parameter should
not be changed.

New property degirum.model.Model.input_shape is added to the Model class. This property allows unified access to model input size/shape parameters: InputN, InputH, InputW, InputC, and InputShape regardless of the input type (image or tensor).

The getter returns and the setter accepts the list of input shapes, one shape per each model input.
Each element of that list (which defines a shape for particular input) is another list containing
input dimensions, slowest dimension first.

For each input, the getter returns `InputShape` value if `InputShape` model parameter is specified for the input, otherwise
it returns `[InputN, InputH, InputW, InputC]`.

The setter works symmetrically: it assigns the provided list to `InputShape` parameter, if it was specified for the model input,
otherwise it assigns provided list to `InputN`, `InputH`, `InputW`, and `InputC` parameters in that order
(i.e. zero index to `InputN` and so forth).

This property can be used in conjunction with dynamic inputs feature to simplify setting of input shapes.

If DG_CPU_LIMIT_CORES environment variable is defined, its value is used by AI server to limit the number of virtual CPU inference devices, such as N2X/CPU or OPENVINO/CPU. When it is not defined, one half of the total physical CPU cores is used, as in previous versions. This feature is useful, when AI server is running in Docker container and you want to limit the number of virtual CPU inference devices to reduce the CPU load.

Bug Fixes

OpenVINO CPU model inferences fail intermittently when running many models on the same node with the following error message: "CompiledModel was not initialized."
Model filtering functionality of degirum.zoo_manager.ZooManager.list_models method was broken:
it does not filter-out models, which are not supported by the inference engine, attached to zoo manager object,
it does not filter-out models, which has empty SupportedDeviceTypes model parameter.
Model fallback parameters support is broken for AI server inference mode.

Version 0.12.1 (4/25/2024)

New Features and Modifications

New property degirum.model.Model.supported_device_types is added to the Model class. This read-only property returns the list of runtime/device types supported simultaneously by the model and by connected inference engine. Each runtime/device type in the list is represented by a string in a format "RUNTIME/DEVICE".
```
For example, the list `["OPENVINO/CPU", ONNX/CPU"]` means that the model can be run on both Intel OpenVINO
and Microsoft ONNX runtimes using CPU as a hardware device.
```

The degirum.model.Model.device_type property now accepts a list of desired "RUNTIME/DEVICE" pairs. The first supported pair from that list will be set. This simplifies inference device assignment for multi-device models on a variety of systems with different sets of inference devices.

For example, you have a model, which supports all devices of OpenVINO runtime (NPU, GPU, and CPU) and you want
to run this model on NPU, when it is available, otherwise on GPU, when it is available, and fallback to CPU if
neither NPU, nor GPU is available. In this case you may do the following assignment:

```
model.device_type = ["OPENVINO/NPU", "OPENVINO/GPU", "OPENVINO/CPU"]
```

Reading `device_type` property back after list assignment will give you the actual device type assigned
for the inference.

Bug Fixes

Variable tensor shape support is fixed in PySDK for "Tensor" input types for multi-input models, when the input tensor with shape other than 4-D has index other than zero.
Very intermittently, models are not fully downloaded from a cloud model zoo for AI server-based and local inference types, and there is no error diagnostics for that. As the result, corrupted models are used for the inference, which leads to unclear/not related error messages. Correction measures include analyzing "Content-Length" HTTP header when downloading a model archive from a cloud model zoo with retries if the actual downloaded file size is less than expected. Also, zip archive CRC is checked for each file when unpacking model assets.
In case of inference errors, AI server ASIO protocol closes the client socket too soon, which causes error message packet loss on the client side, which, in turn, leads to incorrect error report: instead of actual error, the generic socket errors like "Broken pipe" or "Operation aborted" are reported.
When AI server scans local model zoo and finds a multi-device model, which default runtime/device combination (as specified in RuntimeAgent and DeviceType model parameters) is not supported by the system, it discards such model, despite this model supports other runtime/device combinations available on this system. It happens because SupportedDeviceTypes model parameter is not analyzed when scanning local zoos.

Version 0.12.0 (4/8/2024)

New Features and Modifications

Multi-device/multi-runtime models are supported in PySDK and in the cloud zoo.

Such models have additional model parameter SupportedDeviceTypes, which defines a comma-separated list of runtime/device combinations supported by the model. Each element of this list is "RUNTIME/DEVICE" pair.

The RUNTIME part specifies the runtime, while the DEVICE part specifies the device type. The following runtime/device combinations are supported as of PySDK version 0.12.0:

Runtime Devices

N2X CPU, ORCA1

OPENVINO CPU, GPU, NPU, MYRIAD

ONNX CPU

TFLITE CPU, EDGETPU

TENSORRT GPU, DLA, DLA_FALLBACK

New runtimes and devices can be supported in the future versions of PySDK.

You may specify "*" as the wildcard in any part of the RUNTIME/DEVICE pair: it will match any supported runtime or device type. For example, "N2X/*" defines the model, which supports all devices of N2X runtime (that would be N2X/CPU and N2X/ORCA1), and "*/GPU" defines the model, which supports all GPU devices of all runtimes (that would be OPENVINO/GPU and TENSORRT/GPU).

For multi-device models you may select on the fly, which runtime/device combination to use for the model inference, assuming the desired runtime/device combination is supported by the model. You assign runtime/device combination to degirum.model.Model.device_type property as the string in the format "RUNTIME/DEVICE" exactly as it is defined in the SupportedDeviceTypes list.

You can reassign device_type property multiple times for the same model object. For example:
```
model = zoo.load_model(model_name)

model.device_type = "N2X/ORCA"
result1 = model.predict(data)

model.device_type = "TFLITE/CPU"
result2 = model.predict(data)
```
Just-in-time (JIT) compilation is introduced for DeGirum N2X models for ORCA devices. Now you may create ORCA models specifying either ONNX or TFLITE binary model file in ModelPath model parameter: you do not need to pre-compile your model into .n2x file format. This significantly simplifies model development for DeGirum ORCA devices. When N2X runtime discovers .onnx or .tflite binary model file extension, it automatically invokes N2X compiler and compiles the model into .n2x format, saving the compiled model in the local cache for future use. Cached models are identified in the cache by Checksum model parameter: two models with the same name but with different checksums are cached into two different files.
```
New model parameter `CompilerOptions` is introduced to pass options to JIT compiler. The parameter type is JSON
dictionary, where the key is the runtime/device pair, and the value is the compiler options applicable for this
runtime/device pair. For example: `{ "N2X/ORCA1": "--no-software-layers" }` will pass `--no-software-layers`
compiler option string when compiling models for ORCA1 device and N2X runtime.
```
degirum.connect() now supports new mode of local inference when models are served from the local model zoo directory instead of serving just single model file. To use this mode, you call degirum.connect() passing dg.LOCAL as the first argument, and the path to the local model zoo directory as the second argument:
```
```
zoo = dg.connect(dg.LOCAL, "/path/to/local/zoo/dir")`
```

You may download models to the local model zoo directory similar way as for AI server using `degirum download-zoo` command.
```
New "auto" value is introduced for InputTensorLayout model parameter: when it is set to "auto" then input tensor layout will be selected as "NCHW" for "OPENVINO", "ONNX", and "TENSORRT" runtimes, and "NHWC" otherwise. This feature facilitate creation of multi-runtime models, when input tensor layout should be set to "NCHW" for some runtimes, and "NHWC" for some other runtimes.
New "auto" value is introduced for degirum.model.Model.overlay_alpha property: when it is set to "auto" PySDK will use overlay_alpha = 0.5 for segmentation models and overlay_alpha = 1.0 otherwise. This is now the default value for overlay_alpha property.
The AI server will try to serve the cloud model from the local cache even if the model checksum request is failed due to poor or absent Internet connection. This allows to continue using AI server in case of poor or absent Internet connection, assuming all required cloud models are already downloaded to the local cache.
The model download timeout from the cloud zoo is increased from 10 to 40 seconds.
When running inside Docker container, the number of CPU devices reported by runtimes which support CPU inferences (such as OpenVINO) now takes into account Docker-imposed CPU quotas. For example, if the AI server Docker container is started with --cpus=4, then the number of virtual CPU devices reported by runtime will be half of that amount, i.e. 2 CPUs.
ORCA firmware is now forcefully reset on each AI Server start to ensure clean recovery from previous failures.

Bug Fixes

Model filtering in degirum.zoo_manager.ZooManager.list_models method does not accept "NPU" device type.
Support of dynamically-sized output tensors does not work for OpenVINO runtime.
OpenVINO runtime reports single CPU device in system info, while actual number of virtual devices is more than one.
TensorRT runtime fails with error when quantized model does not specify CalibrationFilePath model parameter.
Variable tensor shape support is fixed in PySDK for "Tensor" input types. In previous versions, for input tensors having other than four dimensions, the following error is raised: "Shape of tensor passed as the input #<n> does not match to model parameters. Expected tensor shape is (<x>, <y>, <z>, <t>)."

Runtime	Devices
N2X	CPU, ORCA1
OPENVINO	CPU, GPU, NPU, MYRIAD
ONNX	CPU
TFLITE	CPU, EDGETPU
TENSORRT	GPU, DLA, DLA_FALLBACK