Skip to content

PySDK Release Notes

Version 0.12.2 (5/17/2024)

New Features and Modifications

  1. New model parameter, InputShape, is supported for AI models with tensor input type (InputType == "Tensor"). This parameter specifies the input tensor shape. It may have arbitrary number of elements, which allows specifying tensor shapes with any number of dimensions. It supersedes InputN, InputH, InputW, and InputC parameters, which are also used for the same purpose: if InputShape parameter is specified for a model input, its value will be used, and InputN, InputH, InputW, and InputC parameters will be ignored.

    The InputShape parameter value is a list of input shapes, one shape per each model input. Each element of that list (which defines a shape for particular input) is another list containing input dimensions, slowest dimension first. For example, NHWC tensor shape is represented as [N, H, W, C] list, where zero index contains N value.

    The InputShape parameter is runtime parameter, meaning that its value can be changed on the fly.

  2. Model parameters InputN, InputH, InputW, and InputC are converted to runtime parameters, so they can be changed on the fly. This allows more effective use of AI models with so-called dynamic inputs, which are supported by OpenVINO runtime (more details by this link).

    In order to adjust the size of the input data, accepted by PySDK preprocessor, you need to assign the actual input data size/shape to be used for consecutive inferences before performing the inference.

    If your model has image input type (InputType == "Image"), then you assign InputN, InputH, InputW, and InputC model parameters to match the size of images to be used for the inference. The PySDK preprocessor will resize the input images to assigned size. If input images already have that size, resizing step will be skipped. In any case, the inference runtime will receive the image of that size.

    If your model has tensor input type (InputType == "Tensor"), then you assign InputShapemodel parameter to match the shape of tensors to be used for the inference. Since PySDK does not do any resizing for tensor inputs, all tensors you pass for inferences must have the specified shape, so the inference runtime will receive the tensors of that shape.

    Not all inference runtimes support dynamic inputs. At the time of this release, only OpenVINO runtime supports them.

    Currently, PySDK does not support batch size other than 1 for image input types, so the InputN model parameter should not be changed.

  3. New property degirum.model.Model.input_shape is added to the Model class. This property allows unified access to model input size/shape parameters: InputN, InputH, InputW, InputC, and InputShape regardless of the input type (image or tensor).

    The getter returns and the setter accepts the list of input shapes, one shape per each model input. Each element of that list (which defines a shape for particular input) is another list containing input dimensions, slowest dimension first.

    For each input, the getter returns InputShape value if InputShape model parameter is specified for the input, otherwise it returns [InputN, InputH, InputW, InputC].

    The setter works symmetrically: it assigns the provided list to InputShape parameter, if it was specified for the model input, otherwise it assigns provided list to InputN, InputH, InputW, and InputC parameters in that order (i.e. zero index to InputN and so forth).

    This property can be used in conjunction with dynamic inputs feature to simplify setting of input shapes.

  4. If DG_CPU_LIMIT_CORES environment variable is defined, its value is used by AI server to limit the number of virtual CPU inference devices, such as N2X/CPU or OPENVINO/CPU. When it is not defined, one half of the total physical CPU cores is used, as in previous versions. This feature is useful, when AI server is running in Docker container and you want to limit the number of virtual CPU inference devices to reduce the CPU load.

Bug Fixes

  1. OpenVINO CPU model inferences fail intermittently when running many models on the same node with the following error message: "CompiledModel was not initialized."

  2. Model filtering functionality of degirum.zoo_manager.ZooManager.list_models method was broken:

    • it does not filter-out models, which are not supported by the inference engine, attached to zoo manager object,
    • it does not filter-out models, which has empty SupportedDeviceTypes model parameter.
  3. Model fallback parameters support is broken for AI server inference mode.

Version 0.12.1 (4/25/2024)

New Features and Modifications

  1. New property degirum.model.Model.supported_device_types is added to the Model class. This read-only property returns the list of runtime/device types supported simultaneously by the model and by connected inference engine. Each runtime/device type in the list is represented by a string in a format "RUNTIME/DEVICE".

    For example, the list ["OPENVINO/CPU", ONNX/CPU"] means that the model can be run on both Intel OpenVINO and Microsoft ONNX runtimes using CPU as a hardware device.

  2. The degirum.model.Model.device_type property now accepts a list of desired "RUNTIME/DEVICE" pairs. The first supported pair from that list will be set. This simplifies inference device assignment for multi-device models on a variety of systems with different sets of inference devices.

    For example, you have a model, which supports all devices of OpenVINO runtime (NPU, GPU, and CPU) and you want to run this model on NPU, when it is available, otherwise on GPU, when it is available, and fallback to CPU if neither NPU, nor GPU is available. In this case you may do the following assignment:

    model.device_type = ["OPENVINO/NPU", "OPENVINO/GPU", "OPENVINO/CPU"]

    Reading device_type property back after list assignment will give you the actual device type assigned for the inference.

Bug Fixes

  1. Variable tensor shape support is fixed in PySDK for "Tensor" input types for multi-input models, when the input tensor with shape other than 4-D has index other than zero.

  2. Very intermittently, models are not fully downloaded from a cloud model zoo for AI server-based and local inference types, and there is no error diagnostics for that. As the result, corrupted models are used for the inference, which leads to unclear/not related error messages. Correction measures include analyzing "Content-Length" HTTP header when downloading a model archive from a cloud model zoo with retries if the actual downloaded file size is less than expected. Also, zip archive CRC is checked for each file when unpacking model assets.

  3. In case of inference errors, AI server ASIO protocol closes the client socket too soon, which causes error message packet loss on the client side, which, in turn, leads to incorrect error report: instead of actual error, the generic socket errors like "Broken pipe" or "Operation aborted" are reported.

  4. When AI server scans local model zoo and finds a multi-device model, which default runtime/device combination (as specified in RuntimeAgent and DeviceType model parameters) is not supported by the system, it discards such model, despite this model supports other runtime/device combinations available on this system. It happens because SupportedDeviceTypes model parameter is not analyzed when scanning local zoos.

Version 0.12.0 (4/8/2024)

New Features and Modifications

  1. Multi-device/multi-runtime models are supported in PySDK and in the cloud zoo.

    Such models have additional model parameter SupportedDeviceTypes, which defines a comma-separated list of runtime/device combinations supported by the model. Each element of this list is "RUNTIME/DEVICE" pair.

    The RUNTIME part specifies the runtime, while the DEVICE part specifies the device type. The following runtime/device combinations are supported as of PySDK version 0.12.0:

    Runtime Devices
    N2X CPU, ORCA1

    New runtimes and devices can be supported in the future versions of PySDK.

    You may specify "*" as the wildcard in any part of the RUNTIME/DEVICE pair: it will match any supported runtime or device type. For example, "N2X/*" defines the model, which supports all devices of N2X runtime (that would be N2X/CPU and N2X/ORCA1), and "*/GPU" defines the model, which supports all GPU devices of all runtimes (that would be OPENVINO/GPU and TENSORRT/GPU).

    For multi-device models you may select on the fly, which runtime/device combination to use for the model inference, assuming the desired runtime/device combination is supported by the model. You assign runtime/device combination to degirum.model.Model.device_type property as the string in the format "RUNTIME/DEVICE" exactly as it is defined in the SupportedDeviceTypes list.

    You can reassign device_type property multiple times for the same model object. For example:

    model = zoo.load_model(model_name)
    model.device_type = "N2X/ORCA"
    result1 = model.predict(data)
    model.device_type = "TFLITE/CPU"
    result2 = model.predict(data)
  2. Just-in-time (JIT) compilation is introduced for DeGirum N2X models for ORCA devices. Now you may create ORCA models specifying either ONNX or TFLITE binary model file in ModelPath model parameter: you do not need to pre-compile your model into .n2x file format. This significantly simplifies model development for DeGirum ORCA devices. When N2X runtime discovers .onnx or .tflite binary model file extension, it automatically invokes N2X compiler and compiles the model into .n2x format, saving the compiled model in the local cache for future use. Cached models are identified in the cache by Checksum model parameter: two models with the same name but with different checksums are cached into two different files.

    New model parameter CompilerOptions is introduced to pass options to JIT compiler. The parameter type is JSON dictionary, where the key is the runtime/device pair, and the value is the compiler options applicable for this runtime/device pair. For example: { "N2X/ORCA1": "--no-software-layers" } will pass --no-software-layers compiler option string when compiling models for ORCA1 device and N2X runtime.

  3. degirum.connect() now supports new mode of local inference when models are served from the local model zoo directory instead of serving just single model file. To use this mode, you call degirum.connect() passing dg.LOCAL as the first argument, and the path to the local model zoo directory as the second argument:

    zoo = dg.connect(dg.LOCAL, "/path/to/local/zoo/dir")`

    You may download models to the local model zoo directory similar way as for AI server using degirum download-zoo command.

  4. New "auto" value is introduced for InputTensorLayout model parameter: when it is set to "auto" then input tensor layout will be selected as "NCHW" for "OPENVINO", "ONNX", and "TENSORRT" runtimes, and "NHWC" otherwise. This feature facilitate creation of multi-runtime models, when input tensor layout should be set to "NCHW" for some runtimes, and "NHWC" for some other runtimes.

  5. New "auto" value is introduced for degirum.model.Model.overlay_alpha property: when it is set to "auto" PySDK will use overlay_alpha = 0.5 for segmentation models and overlay_alpha = 1.0 otherwise. This is now the default value for overlay_alpha property.

  6. The AI server will try to serve the cloud model from the local cache even if the model checksum request is failed due to poor or absent Internet connection. This allows to continue using AI server in case of poor or absent Internet connection, assuming all required cloud models are already downloaded to the local cache.

  7. The model download timeout from the cloud zoo is increased from 10 to 40 seconds.

  8. When running inside Docker container, the number of CPU devices reported by runtimes which support CPU inferences (such as OpenVINO) now takes into account Docker-imposed CPU quotas. For example, if the AI server Docker container is started with --cpus=4, then the number of virtual CPU devices reported by runtime will be half of that amount, i.e. 2 CPUs.

  9. ORCA firmware is now forcefully reset on each AI Server start to ensure clean recovery from previous failures.

Bug Fixes

  1. Model filtering in degirum.zoo_manager.ZooManager.list_models method does not accept "NPU" device type.

  2. Support of dynamically-sized output tensors does not work for OpenVINO runtime.

  3. OpenVINO runtime reports single CPU device in system info, while actual number of virtual devices is more than one.

  4. TensorRT runtime fails with error when quantized model does not specify CalibrationFilePath model parameter.

  5. Variable tensor shape support is fixed in PySDK for "Tensor" input types. In previous versions, for input tensors having other than four dimensions, the following error is raised: "Shape of tensor passed as the input #<n> does not match to model parameters. Expected tensor shape is (<x>, <y>, <z>, <t>)."

Version 0.11.1 (3/13/2024)

New Features and Modifications

  1. NPU device support is implemented for OpenVINO runtime. To make a model for NPU device you specify "DeviceType": "NPU" in model JSON file. OpenVINO runtime version 2023.3.0 is required for NPU support.

  2. Python version 3.12 is initially supported by PySDK for Linux and Windows platforms.

  3. Improvements for "Tensor" input type (InputType model parameter equal to "Tensor"):

    • The following tensor element types are supported.

      • "DG_FLT"
      • "DG_UINT8"
      • "DG_INT8"
      • "DG_UINT16"
      • "DG_INT16"
      • "DG_INT32"
      • "DG_INT64"
      • "DG_DBL"
      • "DG_UINT32"
      • "DG_UINT64"

      These type strings you assign to InputRawDataType model parameter.
      In previous versions, only "DG_FLT" and "DG_UINT8" types are supported.

    • Variable tensor shapes are supported. Now you can specify any combination of InputN, InputH, InputW, and InputC model parameters. They will define the input tensor shape in that order. For example, if you specify InputN=1, InputC=77 and omit InputH and InputW, this will give 2-D tensor of shape [1,77]. In previous versions you have to specify all four of them, which always gives 4-D tensor shapes.

    • InputQuantEn model parameter is now ignored by tensor pre-processor: now you have to specify InputRawDataType to match actual model input tensor data type and provide tensor data already converted to this data type.

  4. InputN, InputH, InputW, and InputC model parameters are now not mandatory: you may specify any subset of them.

  5. PySDK InferenceResults.image_overlay() method now returns a copy of input image instead of raising an exception. This gives possibility to safely call this method in case on "None" postprocessor type (OutputPostprocessType model parameter is set to "None")

  6. ModelParams class __str__() operator now prints all model parameters including ones, which are not specified in model JSON file. For such parameters their default values are printed.

  7. If DG_MEMORY_LIMIT_BYTES environment variable is defined, its value is used for AI server in-memory model cache size limit. When it is not defined, one half of the physical memory is used as a cache size limit, as in previous versions. This feature is useful, when AI server is running in Docker container and you want to further limit AI server cache memory size.

Bug Fixes

  1. PostProcessorInputs model parameter presence is now checked only for detection post-processor types to avoid unnecessary errors for post-processor types, which do not use this parameter, such as "None".

Version 0.11.0 (2/10/2024)

New Features and Modifications

  1. Support for different OpenVINO versions is implemented. Now PySDK can work with the following OpenVINO versions:

    • 2022.1.1
    • 2023.2.0
    • 2023.3.0

    When two or more OpenVINO installations are present on a system, the newest version will be used.

  2. Results filtering by class labels and category IDs is implemented: new output_class_set property is added to degirum.model.Model class for this purpose.

    By default, all results are reported by the model predict methods. However, you may want to include only results which belong to certain categories: either having certain class labels or category IDs. To achieve that, you can specify a set of class labels (or, alternatively, category IDs) so only inference results, which class labels (or category IDs) are found in that set, are reported, and all other results are discarded. You assign such a set to degirum.model.Model.output_class_set property.

    For example, you may want to include only results with class labels "car" and "truck":

    # allow only results with "car" and "truck" class labels
    model.output_class_set = {"car", "truck"}

    Or you may want to include only results with category IDs 1 and 3:

    # allow only results with 1 and 3 category IDs
    model.output_class_set = {1, 3}

    This category filtering is applicable only to models which have "label" (or "category_id") keys in their result dictionaries. For all other models this category filter will be ignored.

Bug Fixes

  1. When two different models have two different Python postprocessor implementations saved into files with the same name, only the first Python postprocessor module gets loaded on AI server. This happens because it is loaded into Python global 'sys.modules` collection as a module named after the file name, and if two files have the same name, they collide.

  2. When an implementation of Python postprocessor in a model gets changed, and that model was already loaded on AI server, then the Python postprocessor module is not reloaded on the next model load. This is because once the Python module is loaded into Python interpreter, it is saved in 'sys.modules` collection, and any attempt to load it again just takes it from there.

  3. Performing inferences with ONNX runtime agent (degirum.model.Model.model_info.RuntimeAgent equal to "ONNX") may cause AI server to crash.

Version 0.10.4 (1/24/2024)

New Features and Modifications

The dependency on CoreClient PySDK module is made on-demand, meaning that CoreClient PySDK module is attempted to load only when local inference is invoked, or when local AI server is started from PySDK. This allows using cloud and AI server client functionality of PySDK on systems with missing CoreClient's module dependencies.

Bug Fixes

Fixed bug in YOLOv8 post-processor affecting models with non-square input tensors. Previously Y-coordinate (height) of all detections coming from YOLOv8 models with input image resolutions with width not equal to height would be misinterpreted; now the behavior is correct.

Version 0.10.3 (1/17/2024)

New Features and Modifications

  1. ORCA1 firmware version 1.1.9 is included in this release. This firmware implements measures to improve data integrity of DDR4 external memory when entering/leaving low-power mode.

  2. To avoid any possible future incompatibilities, the PySDK package requirements now explicitly limit upper versions for all dependencies to be one major revision more than corresponding lower version. For example: requests >= 2.30.0 becomes requests >= 2.30.0, < 3.0.

  3. AI annotations drawing performance is greatly improved for object detection annotations.

  4. Default value for alpha blending coefficient is set to 1.0: disable blending. This is performance-improvement measure.

  5. Color selection for different classes in case when a list of colors is assigned to degirum.model.Model.overlay_color property, is improved. It is performed based on the class ID, if the object class ID is in the model dictionary. Otherwise new unique color is assigned to the class and it is associated with the class label. This mechanism produces stable color-to-class assignment from frame to frame and also allows combining results of multiple different models on a single annotation, assigning different colors to classes which may have the same class IDs but different class labels.

  6. Printing scores on AI annotations is now performed with type-dependent format: if the score is of integer type, there will be no fractional part. This improves readability in case of regression models producing integer results.

  7. Quality of OpenCV font used for AI annotations is improved.

  8. Model statistics formatting now uses wider columns to accommodate long statistics.

Version 0.10.2 (12/1/2023)

Discontinued Functionality

The N2X compiler support for DeGirum Orca 1.0 devices is discontinued. Starting from this version, N2X compiler cannot compile models for Orca 1.0 devices: only Orca 1.1 devices are supported.

However, runtime operations for Orca 1.0 devices are still fully supported: you can continue to use Orca 1.0 devices with already compiled models.

Bug Fixes

degirum server rescan-zoo and degirum server shutdown CLI commands do not work with new HTTP AI servers protocols. An attempt to execute such commands for AI servers launched with HTTP protocol option causes error messages.

Version 0.10.1 (11/2/2023)

New Features and Modifications

  1. The HTTP+WebSocket AI server protocol is initially supported for DeGirum AI Server.

    Starting from PySDK version 0.10.0, AI server supports two protocols: asio and http. The asio protocol is DeGirum custom socket-based AI server protocol, supported by all previous PySDK versions. The http protocol is a new protocol, which is based on REST HTTP requests and WebSockets streaming. The http protocol allows to use AI server from any programming language, which supports HTTP requests and WebSockets, such as browser-based JavaScript, which does not support native sockets, thus precluding the use of asio protocol.

    When you start AI server by executing degirum server start command, you specify the protocol using --protocol parameter, which can be asio, http, or both.

    If you omit this parameter, asio protocol will be used by default to provide compatible behavior with previous PySDK verisions.

    You select the http protocol by specifying --protocol http.

    You may select both protocols by specifying --protocol both. In this case, AI server will listen to both protocols on two consecutive TCP ports: the first port is used for asio protocol, the second port is used for http protocol.

    For example: start AI server to serve models from ./my-zoo directory, use asio protocol on port 12345, and use http protocol on port 12346:

    degirum server start --zoo ./my-zoo --port 12345 --protocol both

    On a client side, when you connect to AI server with http protocol, you have to prefix AI server hostname with http:// prefix, for example:

    zoo = dg.connect("http://localhost")

    To connect to AI server with asio protocol you simply omit the protocol prefix.

  2. Now you may pass arbitrary model properties (properties of degirum.model.Model`` class) as keyword arguments todegirum.zoo_manager.ZooManager.load_model` method. In this case these properties will be assigned to the model object.

    For example:

    model = zoo.load_model(model_name, output_confidence_threshold=0.5, input_pad_method="letterbox")
  3. Multi-classifier (or multi-label) classification models are initially supported. The post-processor type string, which is assigned to OutputPostprocessType model parameter, is "MultiLabelClassification". Each inference result dictionary contains the following keys:

    • classifier: object class string.
    • results: list of class labels and its scores. Scores are optional.

    The results list element is a dictionary with the following keys:

    • label: class label string.
    • score: optional class label probability.


            'classifier': 'vehicle color',
            'results': [
                {'label': 'red', 'score': 0.99},
                {'label': 'blue', 'score': 0.01}
            'classifier': 'vehicle type',
            'results': [
                {'label': 'car', 'score': 0.99},
                {'label': 'truck', 'score': 0.01}   

Bug Fixes

  1. Unclear error message 'NoneType' object has no attribute 'shape' appears when supplying non-existing file for model inference.

  2. Local AI inference of a model with Python post-processor hangs on model destruction due to Python GIL deadlock.

  3. degirum sys-info command re-initializes DeGirum Orca AI accelerator hardware not in interprocess-safe way, disrupting operation of other processes using the same Orca accelerator hardware. The first attempt to fix this bug was in PySDK version 0.9.6, this release finally fixes this bug.