Model Configuration JSON File Parameters
Parameter Table
The Model Parameter table below summarizes
PySDK supports models of the following categories:
- Image Object Detection
- Image Classification
- Image Semantic Segmentation
- Image Pose Detection
- Sound Classification
The Models column in the table below contains the model category to which the corresponding parameter applies.
Parameter Name | Type | Default | Mandatory | Models | Description | ||
---|---|---|---|---|---|---|---|
General Parameters: on top level, outside of any section | |||||||
ConfigVersion | int | 0 | yes | All | Version of JSON configuration file. This version is checked against minimum compatible and current framework software versions. If it is not within that range, version check runtime exception is generated on the model loading. | ||
Checksum | string | "" | yes | All | Checksum of model binary file | ||
Target Device Parameters: section "DEVICE" | |||||||
DeviceType | string | "CPU" | no | All | This field defines on which device the inference should be executed. Supported values: |
||
RuntimeAgent | string | "Default" | no | All | Defines the runtime agent to use. Supported values: "N2X" - N2X runtime agent |
||
SupportedDeviceTypes | string | "" | no | All | Comma-separated list of runtime agent/device type combinations, supported by the model. For example: "OPENVINO/CPU,ONNX/CPU" | ||
Model Parameters: section "MODEL_PARAMETERS" | |||||||
ModelPath | string | - | no | All | A path to a model file | ||
CalibrationFilePath | string | - | no | All | Path to coefficient calibration file required when using Tensor RT models with post-training quantization. | ||
UseFloat16 | bool | false | no | All | Enables the use of the special Float16 type in Tensor RT. | ||
CompilerOptions | JSON | {} | no | All | Model compiler options, keyed by runtime agent type. For example: { "N2X/CPU": "--device SW", "N2X/ORCA1": "--device HW" } | ||
Preprocessing Parameters: section "PRE_PROCESS". This section may have more than one element. For multi-input networks each element describes one input tensor of such network. |
|||||||
InputType | string | "Image" | no | All | The model input kind: image, sound, etc. Supported values:
Note: order of dimensions is defined by InputTensorLayout parameter. |
||
InputN | int | - | yes | All | Input data tensor batch size. | ||
InputH | int | - | yes | All | Input data tensor height. | ||
InputW | int | - | yes | All | Input data tensor width. | ||
InputC | int | - | yes | All | Input data tensor number of channels. | ||
InputTensorLayout | string | "NHWC" | no | Image | [For inputs of raw image type and raw tensor type] defines the dimensional layout of raw binary tensor. Supported values: |
||
InputQuantEn | bool | false | no | All | [For inputs of image type and raw tensor type] enables input quantization. Applies for images and raw tensors. This parameter defines actual model input requirement: is it uint8 or flt32; it is not a runtime parameter, so it cannot be changed on the fly (compare with InputRawDataType). When ModelQuantEn is true (quantization is enabled), input data is converted by pre-processor to uint8 data type, otherwise it is converted to flt32 data type. |
||
InputQuantOffset | float | 0 | no | All | [For inputs of image type and raw tensor type] defines the image quantization zero offset (see InputQuantScale for quantization formula) |
||
InputQuantScale | float | 1 | no | All | [For inputs of image type and raw tensor type] defines the image quantization scale. When model quantization is enabled (InputQuantEn is true), then the data going to the model input will be scaled before the quantization by applying the formula: |
||
InputRawDataType | string | "DG_UINT8" | no | All | [For inputs of raw image type and raw tensor type] defines the data type of raw binary tensor element how the pre-processor will treat client data. It is runtime parameter, meaning that it can be changed by client on a fly to better suit his data requirements (compare with InputQuantEn). Supported values: "DG_UINT8" - 8-bit unsigned integer "DG_FLT" - 32-bit floating point "DG_INT16" - 16-bit signed integer |
||
InputImgFmt | string | "JPEG" | no | Image | [For inputs of image type] defines the image format. Supported values: "JPEG": input is a JPEG file; "RAW": input is raw binary tensor. Its data type is defined by InputRawDataType |
||
InputImgRotation | int | 0 | no | Image | [For inputs of image type] defines input image rotation angle in degrees, clockwise. Supported values: 0, 90, 180, and 270 degrees. Other values will be coerced to the nearest supported value. | ||
InputColorSpace | string | "RGB" | no | Image | [For inputs of image type] defines the color space required by the model. Supported values: "RGB" - red->green->blue layout "BGR" - blue->green->red layout In case of JPEG image type, the proper conversion will be done by preprocessor. In case of raw binary image type, the raw binary tensor must be arranged accordingly by caller. |
||
InputScaleEn | bool | false | no | Image | Defines the type of global data normalization. If true, InputScaleCoeff will be used as scale, while preparing input data: input_data = input_data ∗ scale, if false, scale = 1. | ||
InputScaleCoeff | double | 1./255. | no | Image | Defines a scale for global data normalization. | ||
InputNormMean | float array | [] | no | Image | [For inputs of image type] defines mean values for per-channel normalization, e.g. : "InputImgMean" : [0.485,0.456,0.406] | ||
InputNormStd | float array | [] | no | Image | [For inputs of image type] defines StDev values for per-channel normalization, e.g. : "InputImgStd" : [0.229,0.224,0.225] |
||
InputImgSliceType | string | "None" | Image (YOLO) | [For inputs of image type] defines the slicing algorithm to use. Supported values:
|
|||
InputWaveformSize | int | 15600 | No | Sound | Input frame size in samples for input audio | ||
InputSamplingRate | double | 16000 | No | Sound | Input audio sampling rate in Hz | ||
InputFrameSize | int | 400 | No | Sound | Fourier Transform window size in samples for input audio. The input waveform is divided by overlapping windows of this size with the step specified by InputFrameHopStepSize parameter. | ||
InputFrameHopStepSize | int | 160 | No | Sound | Fourier Transform window hop step size in samples for input audio. | ||
InputMelFrequencyRange | double array | [] | No | Sound | Mel spectrogram frequency range for input audio processing. When not empty, should contain two elements: lower frequency and upper frequency in Hz. | ||
InputResizeMethod | string | "bilinear" | No | Image | Interpolation algorithm for image resizing. Supported values:
|
||
InputPadMethod | string | "letterbox" | No | Image | How input image will be padded or cropped when resized. Supported values:
|
||
InputCropPercentage | double | 1.0 | No | Image | Percentage value for cropping image with "crop-first" or "crop-last" InputPadMethod | ||
ImageBackend | string | "auto" | No | Image | Python package to be used for image processing. Supported values:
|
||
Postprocessing Parameters: section "POST_PROCESS" | |||||||
OutputPostprocessType | string | "None" | no | All | The following options can be set:
|
||
PostProcessorInputs | int array | [] | yes | All | Specifies the output tensors the postprocessor operates on, in an order declared by the implementation of any postprocessor. See "Post Processor Inputs", below. | ||
PythonFile | string | - | no | All | The name of Python file containing Python post processor code. Must be specified when OutputPostprocessType is set to "Python" | ||
OutputNumClasses | int | - | no | Image Detection | Number of output classes | ||
OutputSoftmaxEn | bool | false | no | Image or Sound Classification | Enables/disables softmax in postprocessing. | ||
OutputClassIDAdjustment | int | 0 | no | Image Classification or Detection | Adjust the index of the first non-background class. | ||
OutputConfThreshold | double | 0.1 | no | Image or Sound Classification, Image or Pose Detection | Filters out all the results below the threshold | ||
OutputNMSThreshold | double | 0.6 | no | Image Detection | A threshold for Non-Max Suppression algorithm | ||
OutputTopK | size_t | 0 | no | Image or Sound Classification | Number of classes to include in classification result. If zero - report all classes above OutputConfThreshold. | ||
PoseThreshold | double | 0.8 | no | Pose Detection | Pose score threshold to filter whole pose | ||
MaxDetections | int | 20 | no | Image or Pose Detection | Maximum number of total object detection results to report | ||
MaxDetectionsPerClass | int | 100 | no | Image Detection | Maximum number of per-class object detection results to report | ||
MaxClassesPerDetection | int | 30 | no | Image Detection | Maximum number of classes to report | ||
UseRegularNMS | bool | true | no | Image Detection | Use regular NMS algorithm for object detection flag | ||
NMSRadius | double | 10 | no | Pose Detection | A keypoint candidate is rejected if it is within NMSRadius pixels from the corresponding part of a previously detected instance. | ||
XScale | double | 1 | conditional | Image Detection | X scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks. |
||
YScale | double | 1 | conditional | Image Detection | Y scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks. |
||
HScale | double | 1 | conditional | Image Detection | Height scale coefficient to convert box size coordinates to anchor-based coordinate system. | ||
WScale | double | 1 | conditional | Image Detection | Width scale coefficient to convert box size coordinates to anchor-based coordinate system. | ||
Stride | int | 16 | no | Pose Detection | Stride scale coefficient for pose detection | ||
LabelsPath | string | "" | no | Image or Sound Classification, Image Detection | Path to label dictionary file |
Post Processor Inputs
The "PostProcessorInputs" parameter specifies the model output tensors the postprocessor operates on, in the order declared by the implementation of a postprocessor. For N2X and TensorFlow Lite runtime agents, each element of the "PostProcessorInputs" array should contain a numeric ID of an output in the model file that cosists of described data. For OpenVINO, Tensor RT or ONNX runtime agents, each element of the "PostProcessorInputs" array should contain the ordinal of an input/output in the model file that cosists of described data (the first input listed in Netron has ordinal 0, the second input or output, ordinal 1, et cetera).
Example - if in the documentation for some postprocessor the following input tensor order is given: - 0 - probabilities - 1 - bounding boxes
the resulting line in the model's JSON file should look like:
The requirements of various post-processors are outlined below.Classification
- 0 - class probabilities
Segmentation
- 0 - pixel class matrix
Detection
- 0 - anchors
- 1 - box regressors
- 2 - class probabilities
DetectionYolo
- 0 - probabilities/box regressors tensor (main model output)
- 1 - xy_mul_concat tensor
- 2 - grid_concat tensor
- 3 - anchor_grid_concat tensor
DetectionYoloPlates
This post processor uses the same scheme as DetectionYolo.
DetectionYoloV8
- 0 - box regressors tensor [1, 6400, 64]
- 1 - box regressors tensor [1, 1600, 64]
- 2 - box regressors tensor [1, 400, 64]
- 3 - probabilities regressors tensor [1, 6400,
number of classes
] - 4 - probabilities regressors tensor [1, 1600,
number of classes
] - 5 - probabilities regressors tensor [1, 400,
number of classes
]
FaceDetection
- 0 - box regressors
- 1 - face probabilities
HandDetection
- 0 - Identity - frame coordinates
- 1 - Identity_1 - score for whole hand
- 2 - Identity_2 - handedness score [0..1] -> [left..right]
- 3 - Identity_3 - world metric coordinates
PoseDetection
- 0 - heatmaps
- 1 - shorts
- 2 - mids