Model Configuration JSON File Parameters
Parameter Table
The Model Parameter table below summarizes
PySDK supports models of the following categories:
- Image Object Detection
- Image Classification
- Image Semantic Segmentation
- Image Pose Detection
- Sound Classification
The Models column in the table below contains the model category to which the corresponding parameter applies.
General Parameters
On top level, outside of any section
Parameter Name | Type | Default | Mandatory | Models |
ConfigVersion | int | 0 | yes | All |
Version of JSON configuration file. This version is checked against minimum compatible and current framework software versions. If it is not within that range, version check runtime exception is generated on the model loading. | ||||
Checksum | string | "" | yes | All |
Checksum of model binary file |
Target Device Parameters
Section "DEVICE"
Parameter Name | Type | Default | Mandatory | Models |
RuntimeAgent | string | "Default" | no | All |
Defines the runtime agent to use.
Supported values:
| ||||
DeviceType | string | "CPU" | no | All |
This field defines on which device the inference should be executed. Valid device type depends on the selected runtime agent.
Supported values:
| ||||
SupportedDeviceTypes | string | "" | no | All |
Comma-separated list of runtime agent/device type combinations, supported by the model.
Model Parameters
Parameter Name | Type | Default | Mandatory | Models |
ModelPath | string | - | no | All |
A path to a model file | ||||
CalibrationFilePath | string | - | no | All |
Path to coefficient calibration file required when using Tensor RT models with post-training quantization. | ||||
UseFloat16 | bool | false | no | All |
Enables the use of the special Float16 type in TensorRT runtime. | ||||
CompilerOptions | JSON | {} | no | All |
Model compiler options, keyed by runtime agent type. For example:
{ "N2X/CPU": "--device SW", "N2X/ORCA1": "--device HW" }
Preprocessing Parameters
This section may have more than one element. For multi-input networks each element describes one input tensor of such network.
Parameter Name | Type | Default | Mandatory | Models |
InputType | string | "Image" | no | All |
The model input kind: image, sound, etc.
Supported values:
| ||||
InputN | int | - | yes | All |
Input data tensor batch size. | ||||
InputH | int | - | yes | All |
Input data tensor height. | ||||
InputW | int | - | yes | All |
Input data tensor width. | ||||
InputC | int | - | yes | All |
Input data tensor number of channels. | ||||
InputTensorLayout | string | "NHWC" | no | Image |
[For inputs of raw image type and raw tensor type] defines the dimensional layout of raw binary tensor.
Supported values:
| ||||
InputQuantEn | bool | false | no | All |
[For inputs of image type and raw tensor type] enables input quantization. Applies for images and raw tensors. This parameter defines actual model input requirement: is it uint8 or flt32; it is not a runtime parameter, so it cannot be changed on the fly (compare with InputRawDataType). When ModelQuantEn is true (quantization is enabled), input data is converted by pre-processor to uint8 data type, otherwise it is converted to flt32 data type. | ||||
InputQuantOffset | float | 0 | no | All |
[For inputs of image type and raw tensor type] defines the image quantization zero offset (see InputQuantScale for quantization formula) | ||||
InputQuantScale | float | 1 | no | All |
[For inputs of image type and raw tensor type] defines the image quantization scale. When model quantization
is enabled (InputQuantEn is true), then the data going to the model input will be scaled
before the quantization by applying the formula:out = quantize(in / InputQuantScale) + InputQuantOffset
| ||||
InputRawDataType | string | "DG_UINT8" | no | All |
[For inputs of raw image type and raw tensor type] defines the data type of raw binary tensor element
how the pre-processor will treat client data. It is runtime parameter, meaning that it can be changed by
client on a fly to better suit his data requirements (compare with InputQuantEn).
Supported values:
| ||||
InputImgFmt | string | "JPEG" | no | Image |
[For inputs of image type] defines the image format.
Supported values:
| ||||
InputImgRotation | int | 0 | no | Image |
[For inputs of image type] defines input image rotation angle in degrees, clockwise.
Supported values: 0, 90, 180, and 270 degrees. Other values will be coerced to the nearest supported value. | ||||
InputColorSpace | string | "RGB" | no | Image |
[For inputs of image type] defines the color space required by the model.
Supported values:
| ||||
InputScaleEn | bool | false | no | Image |
Defines the type of global data normalization. If true, InputScaleCoeff will be used as scale,
while preparing input data: input_data = input_data ∗ scale, if false, scale = 1 .
| ||||
InputScaleCoeff | double | 1./255. | no | Image |
Defines a scale for global data normalization. | ||||
InputNormMean | float array | [] | no | Image |
[For inputs of image type] defines mean values for per-channel normalization, e.g.: "InputImgMean": [0.485,0.456,0.406]
| ||||
InputNormStd | float array | [] | no | Image |
[For inputs of image type] defines StDev values for per-channel normalization, e.g.: "InputImgStd": [0.229,0.224,0.225]
| ||||
InputImgSliceType | string | "None" | Image (YOLO) | |
[For inputs of image type] defines the slicing algorithm to use.
Supported values:
| ||||
InputWaveformSize | int | 15600 | No | Sound |
Input frame size in samples for input audio | ||||
InputSamplingRate | double | 16000 | No | Sound |
Input audio sampling rate in Hz | ||||
InputFrameSize | int | 400 | No | Sound |
Fourier Transform window size in samples for input audio. The input waveform is divided by overlapping windows of this size with the step specified by InputFrameHopStepSize parameter. | ||||
InputFrameHopStepSize | int | 160 | No | Sound |
Fourier Transform window hop step size in samples for input audio. | ||||
InputMelFrequencyRange | double array | [] | No | Sound |
Mel spectrogram frequency range for input audio processing. When not empty, should contain two elements: lower frequency and upper frequency in Hz. | ||||
InputResizeMethod | string | "bilinear" | No | Image |
Interpolation algorithm for image resizing.
Supported values:
| ||||
InputPadMethod | string | "letterbox" | No | Image |
How input image will be padded or cropped when resized.
Supported values:
| ||||
InputCropPercentage | double | 1.0 | No | Image |
Percentage value for cropping image with "crop-first" or "crop-last" InputPadMethod. | ||||
ImageBackend | string | "auto" | No | Image |
Python package to be used for image processing.
Supported values:
Postprocessing Parameters
Parameter Name | Type | Default | Mandatory | Models |
OutputPostprocessType | string | "None" | no | All |
The type of output post-processing algorithm.
Supported values:
| ||||
PostProcessorInputs | int array | [] | yes | All |
Specifies the output tensors the postprocessor operates on, in an order declared by the implementation of any postprocessor. See "Post Processor Inputs" section below. | ||||
PythonFile | string | - | no | All |
The name of Python file containing Python post processor code. This post-processor runs server-side. Developing such post-processor is advanced topic, not covered in this document. | ||||
OutputNumClasses | int | - | no | Image Detection |
Number of output classes for certain detection models. | ||||
OutputSoftmaxEn | bool | false | no | Image or Sound Classification |
Enables/disables softmax in postprocessing. | ||||
OutputClassIDAdjustment | int | 0 | no | Image Classification or Detection |
Adjust the index of the first non-background class. | ||||
OutputYoloAnchors | int array, 3-dimensional | [] | no | Image Detection (YOLO) |
Anchors used for YOLO post-processing. This field must be specified for YOLO models that rely on anchors for post-processing. Array containing arrays of anchors for each prediction layer; each per-layer array contains two-element arrays, each of which holds two anchor values. For example, for a model with three prediction layers, the JSON may specify: "OutputYoloAnchors": [[[10, 13], [16, 30], [33, 23]], [[30, 61], [62, 45], [59, 119]], [[116, 90], [156, 198], [373, 326]]],
| ||||
OutputYoloStrides | int array | [8, 16, 32] | no | Image Detection (YOLO) |
Strides used for YOLO post-processing. Array containing stride values, one per prediction layer. Default value is for a standard YOLOv5 model with three prediction layers. | ||||
OutputConfThreshold | double | 0.1 | no | Image or Sound Classification, Image or Pose Detection |
Filters out all the results below the threshold | ||||
OutputNMSThreshold | double | 0.6 | no | Image Detection |
A threshold for Non-Max Suppression algorithm | ||||
OutputTopK | size_t | 0 | no | Image or Sound Classification |
Number of classes to include in classification result. If zero - report all classes above OutputConfThreshold. | ||||
PoseThreshold | double | 0.8 | no | Pose Detection |
Pose score threshold to filter whole pose | ||||
MaxDetections | int | 20 | no | Image or Pose Detection |
Maximum number of total object detection results to report | ||||
MaxDetectionsPerClass | int | 100 | no | Image Detection |
Maximum number of per-class object detection results to report | ||||
MaxClassesPerDetection | int | 30 | no | Image Detection |
Maximum number of classes to report | ||||
UseRegularNMS | bool | true | no | Image Detection |
Use regular NMS algorithm for object detection flag | ||||
NMSRadius | double | 10 | no | Pose Detection |
A keypoint candidate is rejected if it is within NMSRadius pixels from the corresponding part of a previously detected instance. | ||||
XScale | double | 1 | conditional | Image Detection |
X scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks. | ||||
YScale | double | 1 | conditional | Image Detection |
Y scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks. | ||||
HScale | double | 1 | conditional | Image Detection |
Height scale coefficient to convert box size coordinates to anchor-based coordinate system. | ||||
WScale | double | 1 | conditional | Image Detection |
Width scale coefficient to convert box size coordinates to anchor-based coordinate system. | ||||
Stride | int | 16 | no | Pose Detection |
Stride scale coefficient for pose detection | ||||
LabelsPath | string | "" | no | Image or Sound Classification, Image Detection |
Path to label dictionary file |
Post Processor Inputs
The "PostProcessorInputs" parameter specifies the model output tensor indexes the postprocessor operates on, in the order declared by the implementation of a postprocessor. For N2X and TensorFlow Lite runtime agents, each element of the "PostProcessorInputs" array should contain a numeric ID of an output in the model file that consists of described data. For OpenVINO, Tensor RT or ONNX runtime agents, each element of the "PostProcessorInputs" array should contain the ordinal of an input/output in the model file that consists of described data (the first input listed in Netron has ordinal 0, the second input or output, ordinal 1, et cetera).
Example - if in the documentation for some postprocessor the following input tensor order is given: - 0 - probabilities - 1 - bounding boxes
then, the resulting line in the model's JSON file should look like:
The requirements of various post-processors are outlined in the table below:
Post-processing Type | "PostProcessorInputs" Value |
Classification | 0 - class probabilities |
Segmentation | 0 - pixel class matrix |
Detection | 0 - anchors 1 - box regressors 2 - class probabilities |
DetectionYolo, DetectionYoloPlates |
0 - probabilities/box regressors tensor (main model output) 1 - xy_mul_concat tensor 2 - grid_concat tensor 3 - anchor_grid_concat tensor |
DetectionYoloV8 | 0 - box regressors tensor [1, 6400, 64] 1 - box regressors tensor [1, 1600, 64] 2 - box regressors tensor [1, 400, 64] 3 - probabilities regressors tensor [1, 6400, number of classes ] 4 - probabilities regressors tensor [1, 1600, number of classes ] 5 - probabilities regressors tensor [1, 400, number of classes ] |
FaceDetection | 0 - box regressors 1 - face probabilities |
HandDetection | 0 - Identity - frame coordinates 1 - Identity_1 - score for whole hand 2 - Identity_2 - handedness score [0..1] -> [left..right] 3 - Identity_3 - world metric coordinates |
PoseDetection | 0 - heatmaps 1 - shorts 2 - mids |