Skip to content

PySDK Release Notes

Version 0.13.2 (7/26/2024)

Bug Fixes

  1. N2X runtime agent fails to load on Linux systems when /dev/bus/usb device is not available. This leads to inability to use N2X/ORCA1 and N2X/CPU inference devices on such systems. This problem affects PySDK installations running on virtual machines and inside Docker images started in non-privileged mode.

Version 0.13.1 (7/17/2024)

New Features and Modifications

  1. Added support of OpenVINO version 2024.2.0.

  2. YOLO segmentation model postprocessing support is implemented in degirum.postprocessor.DetectionResults class.

  3. degirum version command is added to PySDK CLI. Using this command you may obtain PySDK version.

  4. degirum.zoo_manager.ZooManager.system_info() method added. This method queries the system info dictionary of the attached inference engine. The format of this dictionary is the same as the output of degirum sys-info command.

  5. Now to access the DeGirum public cloud model zoo there is no need to use cloud API token. So, the following code will just work:

    import degirum as dg
    zoo = dg.connect(dg.CLOUD)
    zoo.list_models()
    
  6. ORCA1 firmware version 1.1.15 is included in this release. This firmware implements measures to reinitialize DDR4 external memory link in case of failures. This reduces the probability of runtime errors such as "Timeout waiting for RPC EXEC completion".

  7. ORCA1 firmware is now loaded on AI server startup only in case of version mismatch or previously detected critical hardware error. In previous AI server versions it was reloaded unconditionally on every start.

  8. degirum.model.Model.device_type property now can be assigned for single-device models (models for which SupportedDeviceTypes model parameter is not defined). In previous PySDK versions such assignment always generated an error "Model does not support dynamic device type selection: model property SupportedDeviceTypes is not defined".

Bug Fixes

  1. Google EdgeTPU AI accelerator support was broken in PySDK ver. 0.13.0. Now it is restored.

Version 0.13.0 (6/21/2024)

New Features and Modifications

  1. Plugin for RKNN runtime is initially supported. This plugin allows performing inferences of .rknn AI models on RockChip AI accelerators, including:

    • RK3588
    • RK3568
    • RK3568
  2. TFLite plugin now supports the following inference delegates:

    • NXP VX
    • NXP Ethos-U
    • ArmNN
  3. The device_type keyword argument is added to degirum.zoo_manager.ZooManager.list_models method. It specifies the filter for target runtime/device combinations: the string or list of strings of full device type names in "RUNTIME/DEVICE" format. For example, the following code will return the list of models for N2X/ORCA1 runtime/device pair:

    model_list = zoo.list_models(device_type = "N2X/ORCA1")
    
  4. New functions have been added to PySDK top-level API:

    • degirum.list_models()
    • degirum.load_model()
    • degirum.get_supported_devices()

    These functions are intended to further simplify PySDK API.

    The function degirum.list_models() allows you to request the list of models without explicitly obtaining ZooManager object via degirum.connect() call. It combines the arguments of degirum.connect() and degirum.zoo_manager.ZooManager.list_models() which appear one after another, for example:

    list = degirum.list_models(
        degirum.CLOUD, 
        "https://cs.degirum.com", 
        "<token>", 
        device_type="N2X/ORCA1"
    )
    

    The function degirum.load_model() allows you to load the model without explicitly obtaining ZooManager object via degirum.connect() call. It combines the arguments of degirum.connect() and degirum.zoo_manager.ZooManager.load_model(), model name goes first. For example:

    model = degirum.load_model(
        "mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1", 
        degirum.CLOUD, 
        "https://cs.degirum.com", 
        "<token>",
        output_confidence_threshold=0.5, 
    )
    

    The function degirum.get_supported_devices() allows you to obtain the list of runtime/device combinations supported by the inference engine of your choice. It accepts the inference engine designator as a first argument. It returns the list of supported device type strings in a form "/". For example, the following call requests the list of runtime/device combinations supported by the AI server on localhost:

    supported_device_types = degirum.get_supported_devices("localhost")
    
  5. The post-processor for YOLOv8 pose detection models is implemented. The post-processor tag is "PoseDetectionYoloV8".

  6. Pre-processor letter-boxing implementation is changed to match Ultralytics implementation for better mAP match.

  7. ORCA firmware loading time is reduced by 3 seconds.

Bug Fixes

  1. "Timeout 10000 ms waiting for response from AI server" error may happen intermittently at the inference start of a cloud model on AI server, when AI server has unreliable connection to the Internet due to incorrect timeouts on the client side.

  2. Model filtering functionality of degirum.zoo_manager.ZooManager.list_models method works incorrectly with multi-device models having device wildcards in SupportedDeviceTypes. For example, if the model has SupportedDeviceTypes: "OPENVINO/*", then the call zoo.list_models(device="ORCA1") returns such model despite "ORCA1" device is not supported by "OPENVINO" runtime.


Version 0.12.3 (6/3/2024)

Bug Fixes

  1. AI server protocol backward compatibility was broken in PySDK version 0.12.2, which prevented older clients to communicate with newer AI server with cryptic error messages like "RuntimeError: [json.exception.type_error.302] type must be number, but is binary".

  2. Model input parameters for inputs other than zero do not propagate to AI server. For dynamic-input models this may cause errors like "Incorrect input tensor size: the model configuration file defines input tensor to be <X> elements, while the size of supplied tensor is <Y> elements".

  3. An attempt to set degirum.model.Model.input_shape property for "Image" input type fails: it assigns InputShape model parameter instead of InputN/W/H/C model parameters.

  4. Cloud inference for multi-input models was not supported: it leads to inference timeout error messages like "Timeout <X> ms waiting for response from AI server".

  5. N2X JIT compilation fails with segmentation fault when corrupted ONNX file is supplied for compilation. Now it produces the error message "Unknown model format".

  6. License files for third-party libraries distributed in PySDK can potentially create very long file paths which may lead to PySDK installation failure on Windows OS with errors like "Could not install packages due to OSError: [WinError 206] The filename or extension is too long".


Version 0.12.2 (5/17/2024)

New Features and Modifications

  1. New model parameter, InputShape, is supported for AI models with tensor input type (InputType == "Tensor"). This parameter specifies the input tensor shape. It may have arbitrary number of elements, which allows specifying tensor shapes with any number of dimensions. It supersedes InputN, InputH, InputW, and InputC parameters, which are also used for the same purpose: if InputShape parameter is specified for a model input, its value will be used, and InputN, InputH, InputW, and InputC parameters will be ignored.

    The InputShape parameter value is a list of input shapes, one shape per each model input. Each element of that list (which defines a shape for particular input) is another list containing input dimensions, slowest dimension first. For example, NHWC tensor shape is represented as [N, H, W, C] list, where zero index contains N value.

    The InputShape parameter is runtime parameter, meaning that its value can be changed on the fly.

  2. Model parameters InputN, InputH, InputW, and InputC are converted to runtime parameters, so they can be changed on the fly. This allows more effective use of AI models with so-called dynamic inputs, which are supported by OpenVINO runtime (more details by this link).

    In order to adjust the size of the input data, accepted by PySDK preprocessor, you need to assign the actual input data size/shape to be used for consecutive inferences before performing the inference.

    If your model has image input type (InputType == "Image"), then you assign InputN, InputH, InputW, and InputC model parameters to match the size of images to be used for the inference. The PySDK preprocessor will resize the input images to assigned size. If input images already have that size, resizing step will be skipped. In any case, the inference runtime will receive the image of that size.

    If your model has tensor input type (InputType == "Tensor"), then you assign InputShapemodel parameter to match the shape of tensors to be used for the inference. Since PySDK does not do any resizing for tensor inputs, all tensors you pass for inferences must have the specified shape, so the inference runtime will receive the tensors of that shape.

    Not all inference runtimes support dynamic inputs. At the time of this release, only OpenVINO runtime supports them.

    Currently, PySDK does not support batch size other than 1 for image input types, so the InputN model parameter should not be changed.

  3. New property degirum.model.Model.input_shape is added to the Model class. This property allows unified access to model input size/shape parameters: InputN, InputH, InputW, InputC, and InputShape regardless of the input type (image or tensor).

    The getter returns and the setter accepts the list of input shapes, one shape per each model input. Each element of that list (which defines a shape for particular input) is another list containing input dimensions, slowest dimension first.

    For each input, the getter returns InputShape value if InputShape model parameter is specified for the input, otherwise it returns [InputN, InputH, InputW, InputC].

    The setter works symmetrically: it assigns the provided list to InputShape parameter, if it was specified for the model input, otherwise it assigns provided list to InputN, InputH, InputW, and InputC parameters in that order (i.e. zero index to InputN and so forth).

    This property can be used in conjunction with dynamic inputs feature to simplify setting of input shapes.

  4. If DG_CPU_LIMIT_CORES environment variable is defined, its value is used by AI server to limit the number of virtual CPU inference devices, such as N2X/CPU or OPENVINO/CPU. When it is not defined, one half of the total physical CPU cores is used, as in previous versions. This feature is useful, when AI server is running in Docker container and you want to limit the number of virtual CPU inference devices to reduce the CPU load.

Bug Fixes

  1. OpenVINO CPU model inferences fail intermittently when running many models on the same node with the following error message: "CompiledModel was not initialized."

  2. Model filtering functionality of degirum.zoo_manager.ZooManager.list_models method was broken:

    • it does not filter-out models, which are not supported by the inference engine, attached to zoo manager object,
    • it does not filter-out models, which has empty SupportedDeviceTypes model parameter.
  3. Model fallback parameters support is broken for AI server inference mode.


Version 0.12.1 (4/25/2024)

New Features and Modifications

  1. New property degirum.model.Model.supported_device_types is added to the Model class. This read-only property returns the list of runtime/device types supported simultaneously by the model and by connected inference engine. Each runtime/device type in the list is represented by a string in a format "RUNTIME/DEVICE".

    For example, the list ["OPENVINO/CPU", ONNX/CPU"] means that the model can be run on both Intel OpenVINO and Microsoft ONNX runtimes using CPU as a hardware device.

  2. The degirum.model.Model.device_type property now accepts a list of desired "RUNTIME/DEVICE" pairs. The first supported pair from that list will be set. This simplifies inference device assignment for multi-device models on a variety of systems with different sets of inference devices.

    For example, you have a model, which supports all devices of OpenVINO runtime (NPU, GPU, and CPU) and you want to run this model on NPU, when it is available, otherwise on GPU, when it is available, and fallback to CPU if neither NPU, nor GPU is available. In this case you may do the following assignment:

    model.device_type = ["OPENVINO/NPU", "OPENVINO/GPU", "OPENVINO/CPU"]
    

    Reading device_type property back after list assignment will give you the actual device type assigned for the inference.

Bug Fixes

  1. Variable tensor shape support is fixed in PySDK for "Tensor" input types for multi-input models, when the input tensor with shape other than 4-D has index other than zero.

  2. Very intermittently, models are not fully downloaded from a cloud model zoo for AI server-based and local inference types, and there is no error diagnostics for that. As the result, corrupted models are used for the inference, which leads to unclear/not related error messages. Correction measures include analyzing "Content-Length" HTTP header when downloading a model archive from a cloud model zoo with retries if the actual downloaded file size is less than expected. Also, zip archive CRC is checked for each file when unpacking model assets.

  3. In case of inference errors, AI server ASIO protocol closes the client socket too soon, which causes error message packet loss on the client side, which, in turn, leads to incorrect error report: instead of actual error, the generic socket errors like "Broken pipe" or "Operation aborted" are reported.

  4. When AI server scans local model zoo and finds a multi-device model, which default runtime/device combination (as specified in RuntimeAgent and DeviceType model parameters) is not supported by the system, it discards such model, despite this model supports other runtime/device combinations available on this system. It happens because SupportedDeviceTypes model parameter is not analyzed when scanning local zoos.


Version 0.12.0 (4/8/2024)

New Features and Modifications

  1. Multi-device/multi-runtime models are supported in PySDK and in the cloud zoo.

    Such models have additional model parameter SupportedDeviceTypes, which defines a comma-separated list of runtime/device combinations supported by the model. Each element of this list is "RUNTIME/DEVICE" pair.

    The RUNTIME part specifies the runtime, while the DEVICE part specifies the device type. The following runtime/device combinations are supported as of PySDK version 0.12.0:

    Runtime Devices
    N2X CPU, ORCA1
    OPENVINO CPU, GPU, NPU, MYRIAD
    ONNX CPU
    TFLITE CPU, EDGETPU
    TENSORRT GPU, DLA, DLA_FALLBACK

    New runtimes and devices can be supported in the future versions of PySDK.

    You may specify "*" as the wildcard in any part of the RUNTIME/DEVICE pair: it will match any supported runtime or device type. For example, "N2X/*" defines the model, which supports all devices of N2X runtime (that would be N2X/CPU and N2X/ORCA1), and "*/GPU" defines the model, which supports all GPU devices of all runtimes (that would be OPENVINO/GPU and TENSORRT/GPU).

    For multi-device models you may select on the fly, which runtime/device combination to use for the model inference, assuming the desired runtime/device combination is supported by the model. You assign runtime/device combination to degirum.model.Model.device_type property as the string in the format "RUNTIME/DEVICE" exactly as it is defined in the SupportedDeviceTypes list.

    You can reassign device_type property multiple times for the same model object. For example:

    model = zoo.load_model(model_name)
    
    model.device_type = "N2X/ORCA"
    result1 = model.predict(data)
    
    model.device_type = "TFLITE/CPU"
    result2 = model.predict(data)
    
  2. Just-in-time (JIT) compilation is introduced for DeGirum N2X models for ORCA devices. Now you may create ORCA models specifying either ONNX or TFLITE binary model file in ModelPath model parameter: you do not need to pre-compile your model into .n2x file format. This significantly simplifies model development for DeGirum ORCA devices. When N2X runtime discovers .onnx or .tflite binary model file extension, it automatically invokes N2X compiler and compiles the model into .n2x format, saving the compiled model in the local cache for future use. Cached models are identified in the cache by Checksum model parameter: two models with the same name but with different checksums are cached into two different files.

    New model parameter CompilerOptions is introduced to pass options to JIT compiler. The parameter type is JSON dictionary, where the key is the runtime/device pair, and the value is the compiler options applicable for this runtime/device pair. For example: { "N2X/ORCA1": "--no-software-layers" } will pass --no-software-layers compiler option string when compiling models for ORCA1 device and N2X runtime.

  3. degirum.connect() now supports new mode of local inference when models are served from the local model zoo directory instead of serving just single model file. To use this mode, you call degirum.connect() passing dg.LOCAL as the first argument, and the path to the local model zoo directory as the second argument:

    zoo = dg.connect(dg.LOCAL, "/path/to/local/zoo/dir")`
    

    You may download models to the local model zoo directory similar way as for AI server using degirum download-zoo command.

  4. New "auto" value is introduced for InputTensorLayout model parameter: when it is set to "auto" then input tensor layout will be selected as "NCHW" for "OPENVINO", "ONNX", and "TENSORRT" runtimes, and "NHWC" otherwise. This feature facilitate creation of multi-runtime models, when input tensor layout should be set to "NCHW" for some runtimes, and "NHWC" for some other runtimes.

  5. New "auto" value is introduced for degirum.model.Model.overlay_alpha property: when it is set to "auto" PySDK will use overlay_alpha = 0.5 for segmentation models and overlay_alpha = 1.0 otherwise. This is now the default value for overlay_alpha property.

  6. The AI server will try to serve the cloud model from the local cache even if the model checksum request is failed due to poor or absent Internet connection. This allows to continue using AI server in case of poor or absent Internet connection, assuming all required cloud models are already downloaded to the local cache.

  7. The model download timeout from the cloud zoo is increased from 10 to 40 seconds.

  8. When running inside Docker container, the number of CPU devices reported by runtimes which support CPU inferences (such as OpenVINO) now takes into account Docker-imposed CPU quotas. For example, if the AI server Docker container is started with --cpus=4, then the number of virtual CPU devices reported by runtime will be half of that amount, i.e. 2 CPUs.

  9. ORCA firmware is now forcefully reset on each AI Server start to ensure clean recovery from previous failures.

Bug Fixes

  1. Model filtering in degirum.zoo_manager.ZooManager.list_models method does not accept "NPU" device type.

  2. Support of dynamically-sized output tensors does not work for OpenVINO runtime.

  3. OpenVINO runtime reports single CPU device in system info, while actual number of virtual devices is more than one.

  4. TensorRT runtime fails with error when quantized model does not specify CalibrationFilePath model parameter.

  5. Variable tensor shape support is fixed in PySDK for "Tensor" input types. In previous versions, for input tensors having other than four dimensions, the following error is raised: "Shape of tensor passed as the input #<n> does not match to model parameters. Expected tensor shape is (<x>, <y>, <z>, <t>)."


Version 0.11.1 (3/13/2024)

New Features and Modifications

  1. NPU device support is implemented for OpenVINO runtime. To make a model for NPU device you specify "DeviceType": "NPU" in model JSON file. OpenVINO runtime version 2023.3.0 is required for NPU support.

  2. Python version 3.12 is initially supported by PySDK for Linux and Windows platforms.

  3. Improvements for "Tensor" input type (InputType model parameter equal to "Tensor"):

    • The following tensor element types are supported.

      • "DG_FLT"
      • "DG_UINT8"
      • "DG_INT8"
      • "DG_UINT16"
      • "DG_INT16"
      • "DG_INT32"
      • "DG_INT64"
      • "DG_DBL"
      • "DG_UINT32"
      • "DG_UINT64"

      These type strings you assign to InputRawDataType model parameter.
      In previous versions, only "DG_FLT" and "DG_UINT8" types are supported.

    • Variable tensor shapes are supported. Now you can specify any combination of InputN, InputH, InputW, and InputC model parameters. They will define the input tensor shape in that order. For example, if you specify InputN=1, InputC=77 and omit InputH and InputW, this will give 2-D tensor of shape [1,77]. In previous versions you have to specify all four of them, which always gives 4-D tensor shapes.

    • InputQuantEn model parameter is now ignored by tensor pre-processor: now you have to specify InputRawDataType to match actual model input tensor data type and provide tensor data already converted to this data type.

  4. InputN, InputH, InputW, and InputC model parameters are now not mandatory: you may specify any subset of them.

  5. PySDK InferenceResults.image_overlay() method now returns a copy of input image instead of raising an exception. This gives possibility to safely call this method in case on "None" postprocessor type (OutputPostprocessType model parameter is set to "None")

  6. ModelParams class __str__() operator now prints all model parameters including ones, which are not specified in model JSON file. For such parameters their default values are printed.

  7. If DG_MEMORY_LIMIT_BYTES environment variable is defined, its value is used for AI server in-memory model cache size limit. When it is not defined, one half of the physical memory is used as a cache size limit, as in previous versions. This feature is useful, when AI server is running in Docker container and you want to further limit AI server cache memory size.

Bug Fixes

  1. PostProcessorInputs model parameter presence is now checked only for detection post-processor types to avoid unnecessary errors for post-processor types, which do not use this parameter, such as "None".

Version 0.11.0 (2/10/2024)

New Features and Modifications

  1. Support for different OpenVINO versions is implemented. Now PySDK can work with the following OpenVINO versions:

    • 2022.1.1
    • 2023.2.0
    • 2023.3.0

    When two or more OpenVINO installations are present on a system, the newest version will be used.

  2. Results filtering by class labels and category IDs is implemented: new output_class_set property is added to degirum.model.Model class for this purpose.

    By default, all results are reported by the model predict methods. However, you may want to include only results which belong to certain categories: either having certain class labels or category IDs. To achieve that, you can specify a set of class labels (or, alternatively, category IDs) so only inference results, which class labels (or category IDs) are found in that set, are reported, and all other results are discarded. You assign such a set to degirum.model.Model.output_class_set property.

    For example, you may want to include only results with class labels "car" and "truck":

    # allow only results with "car" and "truck" class labels
    model.output_class_set = {"car", "truck"}
    

    Or you may want to include only results with category IDs 1 and 3:

    # allow only results with 1 and 3 category IDs
    model.output_class_set = {1, 3}
    

    This category filtering is applicable only to models which have "label" (or "category_id") keys in their result dictionaries. For all other models this category filter will be ignored.

Bug Fixes

  1. When two different models have two different Python postprocessor implementations saved into files with the same name, only the first Python postprocessor module gets loaded on AI server. This happens because it is loaded into Python global 'sys.modules` collection as a module named after the file name, and if two files have the same name, they collide.

  2. When an implementation of Python postprocessor in a model gets changed, and that model was already loaded on AI server, then the Python postprocessor module is not reloaded on the next model load. This is because once the Python module is loaded into Python interpreter, it is saved in 'sys.modules` collection, and any attempt to load it again just takes it from there.

  3. Performing inferences with ONNX runtime agent (degirum.model.Model.model_info.RuntimeAgent equal to "ONNX") may cause AI server to crash.