For the complete documentation index, see llms.txt. This page is also available as Markdown.

Release Notes

This page features release notes for releases of PySDK. You may download PySDK versions listed here from PyPI.org.

Version 1.3.1 (6/2/2026)

New Features and Modifications

RockChip RKNN runtime agent performance has been improved by introducing asynchronous pipelined inference and model context caching.

Across streaming AI inference workloads using models from the DeGirum public zoo, the average performance gain is 3.2x (+216%).

When the required model context is already cached, model switching latency is reduced from tens of milliseconds to effectively zero, enabling concurrent multi-model inference with almost no switching overhead.

Bug Fixes

On x86-64 systems without AVX2 support, launching AI Server with MemryX runtime installed causes the process to terminate with Illegal instruction (core dumped) error message.


Version 1.3.0 (4/24/2026)

New Features and Modifications

  1. Added support for MemryX runtime version 2.2.

  2. MemryX runtime now operates in MXA Manager mode. The MXA Manager service (or mxa_manager executable) must be up and running to use PySDK with the MemryX accelerator.

  3. Reintroduced support for Ubuntu 20.04 on ARM64 platforms. This also enables using PySDK with Yocto-built Linux distributions whose glibc is version 2.31 or later.

  4. Hardware-side batching has been added to the TensorRT plugin. This feature can improve performance on powerful hardware by grouping multiple frames into a single inference executed on the device. In some cases, sufficiently powerful hardware can process a batch of frames nearly as quickly as a single frame, resulting in higher overall throughput.

    To use hardware-side batching, the model’s ONNX input must support a dynamic batch dimension. This means the first dimension (batch size) should be set to -1. For example:

    • Non-batched input shape: [1, 3, 512, 512]

    • Batched input shape: [-1, 3, 512, 512]

    You must also specify the batch size for execution. This can be done in one of two ways:

    • In the model’s JSON configuration: add the DeviceBatch parameter under the ExtraDeviceParams section within DEVICE.

    • At runtime: set the batch size programmatically using model.extra_device_params.DeviceBatch = <desired_batch_size>

  5. OpenVINO runtime agent performance is improved due to implementation of asynchronous pipelined inference and improved model storage and request management.

  6. OpenVINO runtime agent: added support for additional OpenVINO tensor element types: uint16, int16, and double.

  7. OpenVINO runtime agent: the following extra device parameters are supported:

    • OPENVINO_ENABLE_CPU_PINNING: enables or disables CPU pinning. Disabling it can help overall throughput when several models or workloads run in parallel on the same host.

    • OPENVINO_INFERENCE_NUM_THREADS: sets the maximum number of CPU threads OpenVINO may use for inference work. This is useful when limiting per-model CPU consumption in multi-model pipelines.

    • OPENVINO_NUM_STREAMS: sets the number of parallel execution streams OpenVINO uses. More streams can improve throughput for some workloads, while fewer streams can reduce contention in shared systems.

    • OPENVINO_DENORMALS_OPTIMIZATION: enables denormal flushing on CPU. This can improve performance in some cases, but it may slightly affect numerical accuracy. To assign such parameter, use model.extra_device_params.KEY = value syntax, for example, model.extra_device_params.OPENVINO_INFERENCE_NUM_THREADS = 2.

    These parameters are particularly useful in multi-model pipeline scenarios, where limiting threads or streams per model can improve total system throughput rather than maximizing a single model in isolation. The defaults are optimal for performance in a single model scenario. For more information on each parameter, please refer to the following documents:

    • https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/high-level-performance-hints.html

    • https://docs.openvino.ai/2025/api/c_cpp_api/group__ov__runtime__cpp__prop__api.html

    • https://docs.openvino.ai/2025/api/c_cpp_api/group__ov__runtime__cpu__prop__cpp__api.html.

  8. Added Windows support for degirum install-runtime command, allowing ONNX Runtime and OpenVINO installation via degirum CLI. For example, to install OpenVINO runtime, execute the following command: degirum install-runtime openvino. If Windows reports access denied error, re-run the command from an Administrator PowerShell or Administrator Command Prompt. The installers may need to write into shared locations such as C:\ProgramData or C:\Program Files.

  9. You can now integrate custom or third-party inference engines directly into the DeGirum PySDK using Python.

    New Public API

    The custom engine integration involves three components that work together:

    • ExtZooAccessorBase — manages the model zoo and creates model instances.

    • ExtModelBase — wraps a single model and manages its lifecycle via _predict_handler.

    • Runtime class (yours) — performs the actual inference in its predict() method.

    You register the whole stack under an @-prefixed name using register_zoo_accessor, then use that name as inference_host_address throughout PySDK.

    Function degirum.register_zoo_accessor(designator, accessor_class)

    Registers a custom inference engine under a short @-prefixed name. Once registered, use that name everywhere you would normally pass inference_host_address. Any call to degirum.connect(designator, ...) or degirum.load_model(..., designator, ...) will route inference through your custom accessor class instead of the built-in cloud or local engines. This lets you plug third-party inference runtimes into the standard PySDK workflow without modifying the SDK itself. To unregister your custom inference engine, call degirum.register_zoo_accessor(designator, None).

    Parameter
    Description

    designator

    A string starting with @ (e.g. "@myengine") that identifies the custom engine. Pass this string as inference_host_address to degirum.connect() or degirum.load_model().

    accessor_class

    A class derived from degirum.ExtZooAccessorBase (see below). Pass None to unregister the designator.

    Class degirum.zoo_manager.ExtZooAccessorBase

    Subclass this class to implement your custom inference engine. You only need to implement a subclass constructor with the following signature: __init__(self, url: str, token: str = ""). It should call the base class constructor with proper arguments as described below.

    Constructor ExtZooAccessorBase(url, token="", *, devices, model_class, assets_mgr=LocalZooAssets)

    Parameter
    Description

    url

    Zoo URL or local path (as passed from degirum.connect() as zoo_url argument).

    token

    Optional authentication token (as passed from degirum.connect() as token argument).

    devices

    Mapping of "RUNTIME/DEVICE" runtime/device designator strings to the number of available devices of that type (e.g. {"ONNX/CPU": 1}).

    model_class

    Your ExtModelBase subclass used to create model instances. See below.

    assets_mgr

    Optional custom asset-manager class (defaults to LocalZooAssets, suitable to handle local model zoos: collections of model assets in a local directory).

    Class degirum._ext_zoo_accessor.ExtModelBase

    Subclass this class to implement the model inference logic.

    Abstract method to implement: _predict_handler(self) — a context manager that sets up and tears down the inference runtime.

    Implementations should:

    • Lazily create a runtime object and assign it to self._runtime on the first call, or when self._model_parameters.dirty is set.

    • Pass self._model_parameters and self._result_callback to the runtime constructor.

    • On teardown, set self._runtime.callback to None to break reference cycles.

    The runtime object must expose a predict(frame_data, frame_info) method (called by the base Model pipeline for each input frame) and a callback attribute (to which self._result_callback is assigned). See the Runtime class section below.

    Typical implementation of _predict_handler:

     @contextmanager
     def _predict_handler(self):
         if self._runtime is None or self._model_parameters.dirty:
             self._runtime = MyRuntime(self._model_parameters, self._result_callback)
         try:
             yield
         finally:
             self._runtime.callback = None  # break reference cycle

    Runtime class

    The runtime class, assigned to self._runtime in _predict_handler, is where inference is actually performed. It must implement a constructor and a predict method, and expose a callback attribute.

    Constructor __init__(self, model_params, callback)

    Parameter
    Description

    model_params

    Parameters object read from the model's JSON configuration file. Use it to query the settings needed for inference.

    callback

    The _result_callback(result) method of the parent model class. Call it to asynchronously report inference results. You must also assign it to the self.callback attribute.

    You will also need to create preprocessor and postprocessor objects to use them in the predict() method.

    Typical implementation (based on ONNX runtime):

     def __init__(self, model_params, callback):
         from degirum.CoreClient import Preprocess, Postprocess
    
         self.callback = callback
         self._preprocess = Preprocess(model_params)
         self._postprocess = Postprocess(model_params)
         self._session = ort.InferenceSession(model_params.ModelPath)

    Attribute callback

    Assign the callback constructor argument to this attribute in the constructor: self.callback = callback. Use it to report inference results. The parent model object also uses this attribute to break the circular reference between itself and the runtime object when inference completes.

    Method predict(self, frame_data, frame_info: str)

    This method is the main method of the class, where the inference is performed.

    Parameter
    Description

    frame_data

    A list of memoryview input tensors, one tensor per model input.

    frame_info

    Frame info string as passed from the top level. Typically not used.

    It receives a list of raw frames (one per model input), already resized according to model parameters. Each frame is a memoryview object wrapping a numpy array of input tensor data.

    Processing steps:

    1. Preprocess — use the core-level preprocessor object (created in the constructor) to convert input tensor data to the raw binary format the model expects.

    2. Run inference — execute your custom, runtime-specific inference code.

    3. Postprocess — convert the raw output tensors to human-readable JSON. If your model's output tensors are compatible with one of the PySDK core-level postprocessors, you can call self._postprocess.forward() for this step.

    4. Deliver the result — call self.callback with the JSON result.

    Typical implementation (based on ONNX runtime):

     def predict(self, frame_data, frame_info: str):
         # apply core-level preprocessing to the input frame data
         preprocessed = self._preprocess.forward(frame_data)
    
         # run inference
         outputs = self._session.run(
             None,
             {
                 inp.name: arr
                 for inp, arr in zip(self._session.get_inputs(), preprocessed)
             },
         )
    
         # apply core-level postprocessing to the inference outputs
         result = self._postprocess.forward(outputs)
    
         # deliver result via callback
         self.callback(result)   

    If your inference runtime supports asynchronous execution, predict() can be implemented in a non-blocking way: start the inference and return immediately. Once inference completes, perform the postprocessing step and deliver the result via self.callback. How you detect completion depends on the runtime — it may use callbacks, polling, events, or any other mechanism the runtime provides.

Bug Fixes

  1. OpenVINO runtime agent: an error "Wrong value f64 for property key INFERENCE_PRECISION_HINT" is reported when performing inference on Windows systems. This is due to ABI mismatch between internal type enum for versions 2025.3 and 2023.3.

  2. OpenVINO runtime agent: on systems with more than one OPENVINO/GPU device inference may run on a device other than selected in model.devices_selected list.


Version 1.2.1 (3/19/2026)

New Features and Modifications

  1. DEEPX runtime version 3.2.0 is supported. Corresponding driver version should be 2.1.0-2.

Bug Fixes

  1. "Python postprocessor: configure_worker: wrong worker_id" error occurs intermittently when running models with Python postprocessor.

  2. degirum download-zoo CLI command does not honor token installed by degirum token install or degirum token create.


Version 1.2.0 (3/6/2026)

New Features and Modifications

  1. update_if_newer property is added to degirum.model._ClientModel class. This class is returned by degirum.load_model function when @local inference is requested. This property affects how cloud models are handled by local model cache in case of @local inference. When the cloud model is requested for the inference and this property is set to True, the local model cache always queries the cloud zoo for the model checksum and downloads the model from the cloud zoo to the local cache if checksums mismatch. When set to False (this is default setting) the cloud model is downloaded once and the model checksum is never queried. Please note that in previous releases model checksums are always queried, so the default behavior is changed.

Bug Fixes

  1. When input_shape model property is set to the same value, model parameters get invalidated causing model reload. Model parameters invalidation and model reload must happen only when a model parameter is changed to some new value.

  2. DeepX runtime agent: when a model compiled with older version of DeepX model compiler is loaded for inference, inference hangs in DeepX runtime. Now the model version is validated against supported versions and the error message is generated in case of not supported version.

  3. Fixed bugs in Audio pre-processor for YAMNET models.


Version 1.1.0 (2/17/2026)

New Features and Modifications

  1. ONNX runtime plugin is redesigned to improve inference performance.

  2. DEEPX runtime version 3.1.0 is supported.

  3. --token parameter is added to degirum server start command to be able to install token prior to server launch to enable proper license activation.

  4. degirum machine-id CLI command is added to print host machine-id

  5. degirum token create command is modified to generate meaningful token description like Created by PySDK for <username>@<hostname> (<host IP>)

  6. degirum download-llm CLI command is added to download and install LLM models from DeGirum model zoo for OpenVino GenAI plugin.

Bug Fixes

  1. Automatic renewal of expired token is now performed prior to the PySDK license verification. The absence of such automatic renewal caused license verification error in case when installed token has finite duration and is expired.

  2. Proper import of pyseccomp is implemented in Python post-processor engine: plain import pyseccomp fails on some systems causing ModuleNotFoundError: No module named 'pyseccomp' error.

Version 1.0.0 (1/26/2026)

This is the first production release of PySDK.

⚠️ IMPORTANT:

We have updated the PySDK release distribution process. Pre-production releases (prior to ver. 1.0.0) were published on PyPI (pypi.org). Production releases are distributed through the DeGirum Package Service and pypi.org.

To install the production version of PySDK from DeGirum Package Service, use the following command:

⚠️ IMPORTANT:

Starting from ver. 1.0.0, the usage of premium runtime plugins requires license. The license is obtained automatically by PySDK from DeGirum AI Hub if you have AI Hub token installed on your system. Once requested, the plugin license is stored locally and automatically renewed on expiration. Default expiration period is 10 days. The license is node-locked.

The Free plan allows you to use PySDK premium runtimes on one host. If you need to use PySDK premium runtimes on more than one host, you need to upgrade your AI Hub workspace to Professional or Enterprise plans.

To install existing AI Hub token, you run degirum CLI command: degirum token install <TOKEN> where <TOKEN> is the AI Hub token string which you generate on AI Hub.

To create new token and install it, you run degirum CLI command: degirum token create. If you run this command on a system having graphical desktop, it will open token generation page in your default browser for you. Otherwise it will print URL which you need to paste in any browser.

To upgrade your plan follow these instructions.

The premium plugins include:

  • Akida (Brainchip)

  • Axelera

  • DeepX

  • EdgeCortix

  • Hailo

  • ONNX

  • OpenVINO (Intel)

  • Renesas

  • RKNN (RockChip)

  • TensorRT (nVidia)

Free plugins include DeGirum N2X Orca and Google TFLite.

New Features and Modifications

  1. YOLO26 models are initially supported by PySDK.

  2. Axelera runtime version 1.5.3 is supported. Please note that models compiled by Axelera compiler ver. prior to 1.5.x are not compatible with 1.5.x.

  3. Double-buffering and DMA buffers are supported for Axelera runtime. This is performance optimization.

  4. degirum token install CLI command now accepts token string without --token keyword.

    Before: degirum token install --token <TOKEN>

    Now: degirum token install <TOKEN>

Bug Fixes

  1. PythonFile model parameter is not assigned properly on model loading: the value from model JSON file always overwrites the value assigned in run-time.


Version 0.20.0 (12/16/2025)

New Features and Modifications

  1. Minimum supported Ubuntu version for ARM64 platforms is now 22.04 (bumped from 20.04). For x86-64 platforms it is still 20.04.

  2. Python 3.13 is supported.

  3. Renesas AI accelerators are initially supported for Linux OS. The runtime/device designator for these devices is "RENESAS/RZ-V2*" where * is the particular device model (L, M, MA, H, and N). The supported runtime version is 2.5.1.

  4. model.predict_batch() latency is improved. Previous versions yield only one ready result for each consumed input frame. This led to accumulated latency which never reduced. With new change the code yields results until no more ready results are available.

  5. Audio pre-processor quality and performance for Whisper models is improved.

  6. Cloud model zoo cache is now shared between all processes which use PySDK on a given system. In previous versions for @local inferences temporary non-persistent cloud zoo cache was created for each process.

  7. PythonFile model parameter is now can be modified in runtime. For example, to temporary disable Python post-processor assign model._model_parameters.PythonFile = ""

  8. Startup time of Python postprocessor worker processes is significantly improved due to parallelization.

  9. TensorRT runtime: added JIT compilation of pre-quantized ONNX models.

  10. TensorRT runtime: implemented loading .engine files supplied by the model JSON. This bypasses the JIT entirely.

  11. TensorRT runtime: new extra_device_parameters are supported for TensorRT runtime models: UseFP16 and UseINT8, You may set them by assigning model.extra_device_params.UseFP16 and model.extra_device_params.UseINT8 respectively.

  12. Token creation procedure via degirum token create is improved for headless systems (systems without GUI browser).

  13. Empty zoo_url for @local inference type is now treated as default public cloud zoo.

  14. degirum.get_supported_devices(): zoo_url and token parameters are deprecated as extraneous - you may now omit them.

Bug Fixes

  1. Timing information was stripped from inference results only after creation of inference result object, which exposed that timing data to custom post-processor constructors. This led to single fake result containing only timing information to appear when no actual results are reported by the model. Now it is stripped from inference results before that.

  2. YOLO detection result processors incorrectly quantized confidence threshold when output score tensor quantization parameters produce out-of range results. Now clamping is applied to prevent wrap over.

  3. Proper detection of device count is implemented for Windows platform. In previous versions it was just hardcoded to one device.


Version 0.19.2 (11/01/2025)

ATTENTION: this release contains critical bug fixes. We strongly recommend to upgrade to this release from 0.19.0 and 0.19.1 versions.

Bug Fixes

  1. Critical bug fix: YOLOv8 pose detection and object detection+segmentation models produce incorrect results.

  2. Critical bug fix: when Numpy package version 2.0 or newer is installed, then all models reporting raw tensors like ReID embedding models, pure segmentation models, or models with "None" postprocessor type produce incorrect results. This bug is due to incompatibility of pybind11 library ver. 2.10.3 using in PuSDK build and Numpy ver 2.0 and above.

  3. When Hailo runtime ver. 4.20.1 is installed and Hailo multiprocess service is not running, then PySDK produces HAILO_INVALID_OPERATION error.

  4. When running inferences of models for runtime, switching models causes segmentation fault.


Version 0.19.1 (10/28/2025)

ATTENTION: this release has critical bugs. Please upgrade to newer release 0.19.2 or above!

New Features and Modifications

  1. Improved performance of Detection post-processors, especially on ARM hosts.

  2. Audio preprocessor improvements: now supports different tensor layouts and input dimensions for Whisper preprocessing.

  3. HailoRT versions 4.23.0 and 4.20.1 are supported.

  4. MemryX runtime version 2.0 is supported.

Bug Fixes

  1. Implemented a timeout bypass mechanism for the Axelera runtime plugin to prevent indefinite stalling when the Axelera API fails.

  2. Incorrect inference results appear intermittently when performing streaming predictions using TensorRT plugin due to race condition in TensorRT plugin implementation.


Version 0.19.0 (10/13/2025)

ATTENTION: this release has critical bugs. Please upgrade to newer release 0.19.2 or above!

New Features and Modifications

  1. EdgeCortix AI accelerators are initially supported for Linux OS. The runtime/device designator for these devices is "EDGECORTIX/SAKURA2".

  2. Axelera runtime version 1.4.1 is supported.

  3. Error handling in Axelera runtime plugin is improved:

  • Waits with infinite timeouts are replaced with finite.

  • Errors in inference stream no longer disable device operations.

  1. PySDK image pre-processor throughput is significantly improved by reducing the amount of double-buffering along the processing pipeline. This improvement is more significant for slower hosts.

  2. Introduced a new model parameter: ExtraDeviceParams. This parameter is used for hardware-specific configurations.

  • Inside the Model JSON, it is a JSON object under the key ExtraDeviceParams in DEVICE section.

  • Is is exposed as degirum.model.Model.extra_device_params property in PySDK for setting these parameters.

  • Use model.extra_device_params.KEY = VALUE syntax for setting these parameters.

  1. Hailo Runtime Agent no longer uses degirum.model.Model.eager_batch_size property for internal Hailo batching control. Instead, now it uses HAILO_BATCH_SIZE inside the model's ExtraDeviceParams. Now, to set a model's batch size for Hailo devices, use model.extra_device_params.HAILO_BATCH_SIZE = <desired-batch-size>

  2. numpy package maximum version limitation in PySDK package requirements is relaxed to < 3.0.

  3. degirum.model.Model.model.output_class_set and degirum.model.Model.model.overlay_blur property setters now accept list, set, or string arguments interchangeably. Property getters now return lists.

  4. Now degirum.get_supported_devices() function requires only one argument: inference_host_address. There is no need to pass zoo_url and token parameters anymore but they left for backward-compatibility.

  5. degirum.model.Model.model.devices_selected property is now updated when degirum.model.Model.model.device_type property changes so the selected device list remains valid for newly selected device type. Particularly, when devices_selected list becomes empty after device type change, it is automatically assigned with devices_available value.

Bug Fixes

  1. "Tensor" preprocessor type now allows data types other than DG_FLT or DG_UINT8 to pass through unmodified.

  2. ONNX runtime agent now correctly supports conversion for all input tensor data types, not only DG_FLT or DG_UINT8.

  3. Segmentation results renderer fails on bounding boxes with zero area in mask resize.


Version 0.18.3 (09/19/2025)

New Features and Modifications

  1. Axelera runtime version 1.4 is supported.

  2. OpenVINO GenAI runtime agent is initially supported. This runtime agent allows running inferences of LLM models in PySDK using OpenVINO GenAI runtime.

  3. Integrated Intel GPU is now available as inference device for OPENVINO runtime, when it is the only GPU in the system (if you have both discrete and integrated GPUs, the integrated GPU is still not available). The device designator is "OPENVINO/GPU" as for discrete GPU.

  4. Timeouts are implemented for all AI server commands. Before that modification, some commands, like system information request, may wait indefinitely for AI server response when AI server process already exited for whatever reason.

  5. The degirum install-runtime CLI commands have been updated to better support operation in root environments.

  6. The performance of rendering of segmentation results is greatly improved. The memory consumption to store segmentation masks is also reduced.

  7. Tensor Preprocessor: quantization/dequantization functionality is implemented. When input tensor data type (as specified by InputRawDataType model parameter) does not match model input data type (as deduced from InputQuantEn model parameter), either quantization or dequantization of the input tensor happens. When InputRawDataType is DG_UINT8 and InputQuantEn is false, then dequantization is performed. When InputRawDataType is DG_FLT and InputQuantEn is true, then quantization is performed. Quantization/dequantization parameters are specified by InputQuantScale and InputQuantOffset model parameters.

Bug Fixes

  1. Cloud inference started from a separate thread may produce the error "got Future <Future pending> attached to a different loop".

  2. The degirum server cache-dump command now always outputs valid JSON.

  3. Floating point overflow in bounding box coordinate decoding computation is fixed in YOLOv8 Detection post-processor.


Version 0.18.2 (08/12/2025)

New Features and Modifications

Axelera runtime is now supported on ARM64 platforms.


Version 0.18.1 (08/11/2025)

New Features and Modifications

  1. New post-processor types are supported:

    • Null post-processor. This post-processor effectively disables all post-processing, which is useful for model bechmarking. The post-processor tag is "Null".

    • Dequantization post-processor. This post-processor performs dequantization of all integer-type output tensor results (compare it with None post-processor, which just returns all output tensor results as-is). The post-processor tag is "Dequantization".

  2. Verbosity of degirum token CLI commands is improved. Now all commands produce some confirmation message on completion.

Bug Fixes

  1. When cloud API token with unlimited lifetime is installed in PySDK using degirum token install CLI command, the following dg.connect() call raises error "Invalid isoformat string".


Version 0.18.0 (08/05/2025)

New Features and Modifications

  1. degirum token command tree has been added to PySDK CLI. Using these commands you can manage cloud access API tokens in PySDK. You can install existing token into PySDK internal storage, you can create and install new token, you can query the status of the currently installed token, you can renew currently installed token, when it is expired, and you can delete currently installed token from the PySDK internal storage. The PySDK internal storage is arranged as JSON file in user-specific directory: %APPDATA%\DeGirum for Windows and ~/.local/share/DeGirum for Linux/MacOS. The following commands are supported:

    • degirum token status: prints the status of the currently installed token. If Internet connection is available, it queries the most current token status from the DeGirum AI Hub otherwise it prints latest known token status.

    • degirum token install --token <token string>: installs the provided token into the PySDK internal storage. You need to obtain the token string from DeGirum AI Hub Web GUI.

    • degirum token create: opens automatic token creation URL in your default browser (when available) or prints this URL into console. Using provided URL you need to authenticate in DeGirum AI Hub. Meanwhile, PySDK CLI will wait until you open provided URL and authenticate in DeGirum AI Hub. Once authenticated, the new token with 2-week expiration will be created automatically, and PySDK CLI will pick it up and install into PySDK internal storage.

    • degirum token renew: renews currently installed token. Renewal happens even if the token is already expired. Renewed token is then installed into the PySDK internal storage replacing the currently installed token.

    • degirum token clear: removes the currently installed token from the PySDK internal storage. It does not delete the token from the AI Hub.

  2. degirum.connect function now tries to use currently installed token, if no token is provided as token argument (see degirum token CLI description above). It will try to renew this token automatically if it is about to be expired or already expired. This feature is intended to simplify the token handling in PySDK, when you install the time-limited token once, using PySDK CLI, and then use it indefinitely on this system from all your scripts.

  3. degirum install-runtime command tree has been added to PySDK CLI. Using these commands you can install third-party AI accelerator runtime libraries on your system greatly simplifying system housekeeping. The following commands are supported:

    • degirum install-runtime --list: list all runtimes available for installation. Using this command you may determine arguments for the following command.

    • degirum install-runtime <runtime> <version>: install particular version of particular runtime. You may specify ALL instead of the version number to install all supported versions of that runtime.

    • The following runtimes are supported for Linux Ubuntu/Debian OS in this release (more runtimes will be supported in the future releases):

      Designator
      Runtime
      Versions

      akida

      Brainchip Akida runtime

      2.11.0

      axelera

      Axelera Voyager runtime

      1.3.3

      memryx

      MemryX runtime

      1.2.3-1

      onnx

      ONNX runtime

      1.20.1

      openvino

      Intel OpenVINO runtime

      2024.6.0, 2024.2.0, 2023.3.0

      rknn

      Rockchip RKNN runtime

      2.3.0

  4. HAILO runtime agent now supports HAILORT runtime version 4.22.

  5. Axelera runtime agent now supports Voyager SDK runtime version 1.3.3.

  6. Preprocessing functionality for Whisper models is implemented for audio preprocessor. The InputType model parameter value for such models is Audio. The audio preprocessor distinguishes Whisper models by combination of InputFrameSize equal to 400 and InputFrameHopStepSize equal to 160.

  7. Now degirum.connect() checks the cloud token and cloud zoo validity for non-public zoo access. If you pass an empty or incorrect token, you will get an error "Unable to connect to server hub.degirum.com: your cloud API access token is not valid". If you pass correct token, but the cloud zoo does not exist or not accessible to you, the you will get an error "Cloud model zoo '<space>/<zoo>' either does not exist or you do not have access to it."

  8. When calling degirum.connect(degirum.LOCAL) function to connect for local inference with an empty string passed as zoo_url argument, the error "ZooManager: incorrect local model zoo URL" is raised. In previous releases empty zoo_url argument led to local zoo selection in the current working directory causing a lot of confusion.

Bug Fixes

  1. Multiple bug fixes of intermittent failures of ORCA USB accelerator inferences.


Version 0.17.3 (07/15/2025)

Bug Fixes

Automatic reconnect on critical error functionality in the cloud inference protocol has been broken due to recent changes in this protocol made in 0.17.2 release.


Version 0.17.2 (07/14/2025)

New Features and Modifications

  1. Axelera AI accelerators are initially supported for Linux OS. The runtime/device designator for these devices is "AXELERA/METIS".

  2. DEEPX runtime inference performance is improved by avoiding extra memory copy of inference result tensors.

Bug Fixes

  1. Frequent websocket disconnects happen during streaming cloud inferences in PySDK, significantly reducing streaming frame rate. This bug was due to the race condition in the websocket-client package used by python-socketio synchronous client. Now PySDK uses python-socketio asynchronous client, which does not use websocket-client package.


Version 0.17.1 (06/25/2025)

New Features and Modifications

  1. ONNX runtime agent now supports ONNX runtime version 1.20.1.

Bug Fixes

  1. Hailo models with hardware accelerated NMS layer did not work with PySDK and HailoRT ver. 4.21. Now this bug is fixed.

  2. Once the model is loaded by Hailo agent, it remembers model batch size and any attempt to change the batch size of already loaded model has no effect.


Version 0.17.0 (06/20/2025)

Known Issues

Hailo models with hardware accelerated NMS layer do not work with PySDK and HailoRT ver. 4.21. We recommend to use HailoRT ver. 4.20 for such models until the PySDK fix will be released in the next PySDK version.

New Features and Modifications

  1. Batching is supported for HailoRT runtime. You control the batch size via degirum.model.Model.eager_batch_size property. For single-context models (models which fully fit into accelerator memory) the Hailo runtime batch size is set to auto to improve performance, because such models do not benefit from batching. For multi-context models which do benefit from batching the Hailo runtime batch size is set equal to degirum.model.Model.eager_batch_size.

  2. DEEPX runtime version 2.9.5 is supported. This version supports production version of DEEPX M.2 accelerator card.

  3. Multi-device support was enabled for DEEPX runtime. Now PySDK can operate with multiple DEEPX M.2 accelerator cards.

  4. Models with built-in NMS layer are supported for HailoRT runtime.

  5. Maximum supported model parameters configuration version is increased from 10 to 11.

  6. The post-processor for DamoYolo models is implemented. The post-processor tag is "DetectionDamoYolo".

  7. Custom landmark shapes (other than 2) are supported in PoseDetectionYoloV8 post-processor. Now landmark key in detection results may contain coordinate list with more than 2 elements, for example [x,y,score].

  8. AI annotation renderers now support zero-size line width. You set it via degirum.model.Model.overlay_line_width property. When it is set to zero, no lines are drawn.

  9. The URL parsing logic of degirum.connect() is made more reliable. Now in the case of local inference or AI server inference the zoo URL is first checked is it a cloud zoo URL. The URL is considered as cloud zoo URL if it starts with http:// or https:// scheme or it contains exactly one slash like workspace/zoo. Only if provided URL does not look like cloud zoo URL it is then checked is it local zoo URL.

    • In the case of local inference it is checked is it a valid local path; and if it is not, then the error message "incorrect local model zoo URL: path does not exist" is raised. You may use explicit file:// scheme for local paths.

    • In the case of AI server inference it is considered as local model zoo URL if it is empty string or it starts with aiserver:// scheme. Otherwise the error "incorrect cloud model zoo URL" is raised.

  10. Runtime plugin loading whitelist is implemented. The list of runtime plugins allowed to load can be specified in DG_PLUGINS_ALLOWED environment variable using the following syntax: plugin prefixes separated by any non alphanumeric separator. For example: DG_PLUGINS_ALLOWED=n2x_runtime_agent;onnx_runtime_agent. When DG_PLUGINS_ALLOWED is not defined, all plugins will be loaded as before.

Bug Fixes

  1. When there is no connection to the cloud zoo, the model from AI server model zoo cache can be loaded without token authorization.


Version 0.16.2 (05/09/2025)

New Features and Modifications

  1. C++ SDK example has been modified to support inference of models from cloud zoos.

  2. HAILO runtime agent now supports HAILORT runtime version 4.21.

  3. MEMRYX runtime agent is supported on Windows 10/11 OS.

  4. TENSORRT runtime agent is supported on Windows 10/11 OS.

  5. Dequantization post-processor is implemented. This post-processor performs dequantization of model output tensors according to model dequantization settings for those tensors. The post-processor tag is "Dequantization".

  6. New InferenceResultsType model parameter is added. This model parameter specifies objects of which PySDK result class (a class derived from InferenceResults base class) will be returned as inference results. When the InferenceResultsType model parameter is not defined then the OutputPostprocessType model parameter is used instead, as before. Please note that in previous releases the OutputPostprocessType model parameter was used for both selecting which Core-level post-processor is applied to the model output tensors and objects of what PySDK result class to return as inference results. Having separate parameter to select PySDK result class allows flexibility of matching different Core-level post-processors with different PySDK result classes. For example, "Dequantization" Core-level post-processor can be used with "Classification" PySDK result class.

  7. degirum.inference_results_type model property is added to control InferenceResultsType model parameter in run-time.

  8. degirum.get_inference_results_class method is added to return PySDK result class which objects are returned as inference results for this

  9. degirum postprocessor.register_postprocessor function is added. It registers your own PySDK result class for a specific inference result type (as defined by InferenceResultsType model parameter, see above). This function accepts two arguments: inference result type string and PySDK result class. PySDK result class must inherit degirum postprocessor.InferenceResults base class. Once your class is registered, objects of this class will be returned as inference results for models with InferenceResultsType model parameter equal to the inference result type string you passed when calling register_postprocessor function. This allows developing new custom PySDK result classes and seamlessly integrating them into PySDK.

Bug Fixes

  1. Various error messages appear when trying to do inference of more than 32 different models using HAILORT runtime agent on a single Hailo accelerator device.

  2. Inference multiplexing on all available devices was not performed for the TFLITE and OPENVINO runtime agents: only single device with index 0 was always used. This bug was introduced in ver. 0.16.0.


Version 0.16.1 (04/29/2025)

New Features and Modifications

MemryX runtime agent now supports MemryX runtime version 1.2.

Bug Fixes

TensorRT runtime agent failed to run inferences on DLA devices producing "Cuda failure: invalid device ordinal" error.


Version 0.16.0 (04/17/2025)

New Features and Modifications

  1. DEEPX AI accelerators are initially supported for Linux OS. The runtime/device designator for these devices is "DEEPX/M1A".

    NOTE: due to current limitations of DEEPX runtime the number of supported devices is limited to one.

  2. Error handling is improved for Hailo runtime agent: now all errors reported by HailoRT runtime are treated as critical.

  3. TensorRT runtime agent performance is improved due to implementation of asynchronous pipelined inference.

  4. Hailo runtime agent is supported on Windows 10/11 OS.

  5. postprocess_type keyword argument is added to degirum.zoo_manager.ZooManager.list_models method. It allows filtering models by post-processor type. Possible values for this argument are: "Classification", "Detection", "Segmentation" etc. They correspond to OutputPostprocessType model parameter values.

Bug Fixes

  1. Oriented bounding box post-processor incorrectly interpreted rotation angle tensors for certain OBB models, which led to the error message "Execution failed. Condition '<N1> == <N2>' is not met".

Last updated

Was this helpful?