Skip to content

Model Configuration JSON File Parameters

Parameter Table

The Model Parameter table below summarizes

PySDK supports models of the following categories:

  • Image Object Detection
  • Image Classification
  • Image Semantic Segmentation
  • Image Pose Detection
  • Sound Classification

The Models column in the table below contains the model category to which the corresponding parameter applies.

Parameter Name Type Default Mandatory Models Description
General Parameters: on top level, outside of any section
ConfigVersion int 0 yes All Version of JSON configuration file. This version is checked against minimum compatible and current framework software versions. If it is not within that range, version check runtime exception is generated on the model loading.
Checksum string "" yes All Checksum of model binary file
Target Device Parameters: section "DEVICE"
DeviceType string "CPU" no All

This field defines on which device the inference should be executed. Supported values:
"ORCA" - run on Orca
"CPU" - run on CPU
"EDGETPU" - run on Google EdgeTPU
"GPU" - run on GPU via Nvidia Tensor RT
"DLA_ONLY" - run on DLA via Nvidia Tensor RT. No GPU use is allowed, presence of any layers not supported by DLA will cause just in time compilation fail.
"DLA_FALLBACK" - run on DLA via Nvidia Tensor RT. Unsupported layers will be executed on GPU.
"NPU" - run on NPU via OpenVino.
"RK####" run on Rockchip NPU systems.

RuntimeAgent string "Default" no All

Defines the runtime agent to use. Supported values:

"N2X" - N2X runtime agent
"TFLITE" - TensorFlow Lite runtime agent
"OPENVINO" - OpenVINO runtime agent
"ONNX" - ONNX runtime agent
"RKNN" - RKNN runtime agent
"TENSORRT" - Tensor RT runtime agent

SupportedDeviceTypes string "" no All Comma-separated list of runtime agent/device type combinations, supported by the model. For example: "OPENVINO/CPU,ONNX/CPU"
Model Parameters: section "MODEL_PARAMETERS"
ModelPath string - no All A path to a model file
CalibrationFilePath string - no All Path to coefficient calibration file required when using Tensor RT models with post-training quantization.
UseFloat16 bool false no All Enables the use of the special Float16 type in Tensor RT.
CompilerOptions JSON {} no All Model compiler options, keyed by runtime agent type. For example: { "N2X/CPU": "--device SW", "N2X/ORCA1": "--device HW" }

Preprocessing Parameters: section "PRE_PROCESS".

This section may have more than one element. For multi-input networks each element describes one input tensor of such network.

InputType string "Image" no All

The model input kind: image, sound, etc. Supported values:

  1. "Image": input is an image with dimensions InputW x InputH x InputC

  2. "Tensor": input is raw binary tensor with dimensions InputN x InputW x InputH x InputC

  3. "Audio": input is an array with InputWaveformSize elements.

Note: order of dimensions is defined by InputTensorLayout parameter.

InputN int - yes All Input data tensor batch size.
InputH int - yes All Input data tensor height.
InputW int - yes All Input data tensor width.
InputC int - yes All Input data tensor number of channels.
InputTensorLayout string "NHWC" no Image

[For inputs of raw image type and raw tensor type] defines the dimensional layout of raw binary tensor.

Supported values:
"auto" - deduce tensor layout automatically
"NHWC" - N->Height->Width->Color layout
"NCHW" - N->Color->Height->Width layout

InputQuantEn bool false no All

[For inputs of image type and raw tensor type] enables input quantization. Applies for images and raw tensors.

This parameter defines actual model input requirement: is it uint8 or flt32; it is not a runtime parameter, so it cannot be changed on the fly (compare with InputRawDataType).

When ModelQuantEn is true (quantization is enabled), input data is converted by pre-processor to uint8 data type, otherwise it is converted to flt32 data type.

InputQuantOffset float 0 no All

[For inputs of image type and raw tensor type] defines the image quantization zero offset (see InputQuantScale for quantization formula)

InputQuantScale float 1 no All

[For inputs of image type and raw tensor type] defines the image quantization scale. When model quantization is enabled (InputQuantEn is true), then the data going to the model input will be scaled before the quantization by applying the formula:
out = quantize( in / InputQuantScale ) + InputQuantOffset

InputRawDataType string "DG_UINT8" no All

[For inputs of raw image type and raw tensor type] defines the data type of raw binary tensor element how the pre-processor will treat client data. It is runtime parameter, meaning that it can be changed by client on a fly to better suit his data requirements (compare with InputQuantEn).

Supported values:

"DG_UINT8" - 8-bit unsigned integer

"DG_FLT" - 32-bit floating point

"DG_INT16" - 16-bit signed integer

InputImgFmt string "JPEG" no Image

[For inputs of image type] defines the image format. Supported values:

"JPEG": input is a JPEG file;

"RAW": input is raw binary tensor. Its data type is defined by InputRawDataType

InputImgRotation int 0 no Image [For inputs of image type] defines input image rotation angle in degrees, clockwise. Supported values: 0, 90, 180, and 270 degrees. Other values will be coerced to the nearest supported value.
InputColorSpace string "RGB" no Image

[For inputs of image type] defines the color space required by the model. Supported values:

"RGB" - red->green->blue layout

"BGR" - blue->green->red layout

In case of JPEG image type, the proper conversion will be done by preprocessor. In case of raw binary image type, the raw binary tensor must be arranged accordingly by caller.

InputScaleEn bool false no Image Defines the type of global data normalization. If true, InputScaleCoeff will be used as scale, while preparing input data: input_data = input_data ∗ scale, if false, scale = 1.
InputScaleCoeff double 1./255. no Image Defines a scale for global data normalization.
InputNormMean float array [] no Image [For inputs of image type] defines mean values for per-channel normalization, e.g. : "InputImgMean" : [0.485,0.456,0.406]
InputNormStd float array [] no Image [For inputs of image type] defines StDev values for per-channel normalization, e.g. :
"InputImgStd" : [0.229,0.224,0.225]
InputImgSliceType string "None" Image (YOLO) [For inputs of image type] defines the slicing algorithm to use. Supported values:
  1. "None" - do not use slicing

  2. "SLICE2" - implements x(b,w,h,c) -> y(b,w/2,h/2,4c) slicing algorithm. The procedure related to Focus Layer implementation of YOLO family models. See SpaceToDepth module description in TResNet: High Performance GPU-Dedicated Architecture section 3.2.1.

InputWaveformSize int 15600 No Sound Input frame size in samples for input audio
InputSamplingRate double 16000 No Sound Input audio sampling rate in Hz
InputFrameSize int 400 No Sound Fourier Transform window size in samples for input audio. The input waveform is divided by overlapping windows of this size with the step specified by InputFrameHopStepSize parameter.
InputFrameHopStepSize int 160 No Sound Fourier Transform window hop step size in samples for input audio.
InputMelFrequencyRange double array [] No Sound Mel spectrogram frequency range for input audio processing. When not empty, should contain two elements: lower frequency and upper frequency in Hz.
InputResizeMethod string "bilinear" No Image

Interpolation algorithm for image resizing. Supported values:

  1. "nearest"

  2. "bilinear"

  3. "area"

  4. "bicubic"

  5. "lanczos"

InputPadMethod string "letterbox" No Image

How input image will be padded or cropped when resized. Supported values:

  1. "stretch"

  2. "letterbox"

  3. "crop-first"

  4. "crop-last"

InputCropPercentage double 1.0 No Image Percentage value for cropping image with "crop-first" or "crop-last" InputPadMethod
ImageBackend string "auto" No Image

Python package to be used for image processing. Supported values:

  1. "auto" - tries pil first

  2. "pil"

  3. "opencv

Postprocessing Parameters: section "POST_PROCESS"
OutputPostprocessType string "None" no All

The following options can be set:

  1. "Classification" for Image or Sound Classification

  2. "Detection" for Image Detection

  3. "FaceDetection" for Image Detection

  4. "DetectionYolo" for Image Detection

  5. "DetectionYoloPlates" for Image Detection

  6. "DetectionYoloV8" for Image Detection

  7. "DetectionYoloV10" for Image Detection

  8. "PoseDetection" for Pose Detection

  9. "HandDetection" for Pose Detection

  10. "Segmentation" for Image Semantic Segmentation

  11. "None" (pass-through post processor)

  12. "Python" (Python post processor)

PostProcessorInputs int array [] yes All Specifies the output tensors the postprocessor operates on, in an order declared by the implementation of any postprocessor. See "Post Processor Inputs", below.
PythonFile string - no All The name of Python file containing Python post processor code. Must be specified when OutputPostprocessType is set to "Python"
OutputNumClasses int - no Image Detection Number of output classes
OutputSoftmaxEn bool false no Image or Sound Classification Enables/disables softmax in postprocessing.
OutputClassIDAdjustment int 0 no Image Classification or Detection Adjust the index of the first non-background class.
OutputConfThreshold double 0.1 no Image or Sound Classification, Image or Pose Detection Filters out all the results below the threshold
OutputNMSThreshold double 0.6 no Image Detection A threshold for Non-Max Suppression algorithm
OutputTopK size_t 0 no Image or Sound Classification Number of classes to include in classification result. If zero - report all classes above OutputConfThreshold.
PoseThreshold double 0.8 no Pose Detection Pose score threshold to filter whole pose
MaxDetections int 20 no Image or Pose Detection Maximum number of total object detection results to report
MaxDetectionsPerClass int 100 no Image Detection Maximum number of per-class object detection results to report
MaxClassesPerDetection int 30 no Image Detection Maximum number of classes to report
UseRegularNMS bool true no Image Detection Use regular NMS algorithm for object detection flag
NMSRadius double 10 no Pose Detection A keypoint candidate is rejected if it is within NMSRadius pixels from the corresponding part of a previously detected instance.
XScale double 1 conditional Image Detection

X scale coefficient to convert box center coordinates to anchor-based coordinate system.

Mandatory for object detection networks.

YScale double 1 conditional Image Detection

Y scale coefficient to convert box center coordinates to anchor-based coordinate system.

Mandatory for object detection networks.

HScale double 1 conditional Image Detection Height scale coefficient to convert box size coordinates to anchor-based coordinate system.
WScale double 1 conditional Image Detection Width scale coefficient to convert box size coordinates to anchor-based coordinate system.
Stride int 16 no Pose Detection Stride scale coefficient for pose detection
LabelsPath string "" no Image or Sound Classification, Image Detection Path to label dictionary file

Post Processor Inputs

The "PostProcessorInputs" parameter specifies the model output tensors the postprocessor operates on, in the order declared by the implementation of a postprocessor. For N2X and TensorFlow Lite runtime agents, each element of the "PostProcessorInputs" array should contain a numeric ID of an output in the model file that cosists of described data. For OpenVINO, Tensor RT or ONNX runtime agents, each element of the "PostProcessorInputs" array should contain the ordinal of an input/output in the model file that cosists of described data (the first input listed in Netron has ordinal 0, the second input or output, ordinal 1, et cetera).

Example - if in the documentation for some postprocessor the following input tensor order is given: - 0 - probabilities - 1 - bounding boxes

the resulting line in the model's JSON file should look like:

"PostProcessorInputs": [probabilities_tensor_id, boxes_tensor_id],
The requirements of various post-processors are outlined below.

Classification

  • 0 - class probabilities

Segmentation

  • 0 - pixel class matrix

Detection

  • 0 - anchors
  • 1 - box regressors
  • 2 - class probabilities

DetectionYolo

  • 0 - probabilities/box regressors tensor (main model output)
  • 1 - xy_mul_concat tensor
  • 2 - grid_concat tensor
  • 3 - anchor_grid_concat tensor

DetectionYoloPlates

This post processor uses the same scheme as DetectionYolo.

DetectionYoloV8

  • 0 - box regressors tensor [1, 6400, 64]
  • 1 - box regressors tensor [1, 1600, 64]
  • 2 - box regressors tensor [1, 400, 64]
  • 3 - probabilities regressors tensor [1, 6400, number of classes]
  • 4 - probabilities regressors tensor [1, 1600, number of classes]
  • 5 - probabilities regressors tensor [1, 400, number of classes]

FaceDetection

  • 0 - box regressors
  • 1 - face probabilities

HandDetection

  • 0 - Identity - frame coordinates
  • 1 - Identity_1 - score for whole hand
  • 2 - Identity_2 - handedness score [0..1] -> [left..right]
  • 3 - Identity_3 - world metric coordinates

PoseDetection

  • 0 - heatmaps
  • 1 - shorts
  • 2 - mids