Skip to content

Model Configuration JSON File Parameters

Parameter Table

The Model Parameter table below summarizes

PySDK supports models of the following categories:

  • Image Object Detection
  • Image Classification
  • Image Semantic Segmentation
  • Image Pose Detection
  • Sound Classification

The Models column in the table below contains the model category to which the corresponding parameter applies.

General Parameters

On top level, outside of any section

Parameter NameTypeDefaultMandatoryModels
ConfigVersion int 0 yes All
Version of JSON configuration file. This version is checked against minimum compatible and current framework software versions. If it is not within that range, version check runtime exception is generated on the model loading.
Checksum string "" yes All
Checksum of model binary file

Target Device Parameters

Section "DEVICE"

Parameter NameTypeDefaultMandatoryModels
DeviceType string "CPU" no All
This field defines on which device the inference should be executed.
Supported values:
  • "ORCA" - run on Orca
  • "CPU" - run on CPU
  • "EDGETPU" - run on Google EdgeTPU
  • "GPU" - run on GPU via Nvidia Tensor RT
  • "DLA_ONLY" - run on DLA via Nvidia Tensor RT. No GPU use is allowed, presence of any layers not supported by DLA will cause just in time compilation fail.
  • "DLA_FALLBACK" - run on DLA via Nvidia Tensor RT. Unsupported layers will be executed on GPU.
  • "NPU" - run on NPU via OpenVino.
  • "RK####" run on Rockchip NPU systems.
  • "VITIS_NPU" run on Ryzen NPU via ONNX.
RuntimeAgent string "Default" no All
Defines the runtime agent to use.
Supported values:
  • "N2X" - N2X runtime agent
  • "TFLITE" - TensorFlow Lite runtime agent
  • "OPENVINO" - OpenVINO runtime agent
  • "ONNX" - ONNX runtime agent
  • "RKNN" - RKNN runtime agent
  • "TENSORRT" - Tensor RT runtime agent
SupportedDeviceTypes string "" no All
Comma-separated list of runtime agent/device type combinations, supported by the model. For example: "OPENVINO/CPU,ONNX/CPU"

Model Parameters

Section "MODEL_PARAMETERS"

Parameter NameTypeDefaultMandatoryModels
ModelPath string - no All
A path to a model file
CalibrationFilePath string - no All
Path to coefficient calibration file required when using Tensor RT models with post-training quantization.
UseFloat16 bool false no All
Enables the use of the special Float16 type in TensorRT runtime.
CompilerOptions JSON {} no All
Model compiler options, keyed by runtime agent type. For example: { "N2X/CPU": "--device SW", "N2X/ORCA1": "--device HW" }

Preprocessing Parameters

Section "PRE_PROCESS"

This section may have more than one element. For multi-input networks each element describes one input tensor of such network.

Parameter NameTypeDefaultMandatoryModels
InputType string "Image" no All
The model input kind: image, sound, etc.
Supported values:
  • "Image": input is an image with dimensions InputW x InputH x InputC
  • "Tensor": input is raw binary tensor with dimensions InputN x InputW x InputH x InputC
  • "Audio": input is an array with InputWaveformSize elements.
Note: order of dimensions is defined by InputTensorLayout parameter.
InputN int - yes All
Input data tensor batch size.
InputH int - yes All
Input data tensor height.
InputW int - yes All
Input data tensor width.
InputC int - yes All
Input data tensor number of channels.
InputTensorLayout string "NHWC" no Image
[For inputs of raw image type and raw tensor type] defines the dimensional layout of raw binary tensor.
Supported values:
  • "auto" - deduce tensor layout automatically
  • "NHWC" - N->Height->Width->Color layout
  • "NCHW" - N->Color->Height->Width layout
InputQuantEn bool false no All
[For inputs of image type and raw tensor type] enables input quantization. Applies for images and raw tensors. This parameter defines actual model input requirement: is it uint8 or flt32; it is not a runtime parameter, so it cannot be changed on the fly (compare with InputRawDataType). When ModelQuantEn is true (quantization is enabled), input data is converted by pre-processor to uint8 data type, otherwise it is converted to flt32 data type.
InputQuantOffset float 0 no All
[For inputs of image type and raw tensor type] defines the image quantization zero offset (see InputQuantScale for quantization formula)
InputQuantScale float 1 no All
[For inputs of image type and raw tensor type] defines the image quantization scale. When model quantization is enabled (InputQuantEn is true), then the data going to the model input will be scaled before the quantization by applying the formula:
out = quantize(in / InputQuantScale) + InputQuantOffset
InputRawDataType string "DG_UINT8" no All
[For inputs of raw image type and raw tensor type] defines the data type of raw binary tensor element how the pre-processor will treat client data. It is runtime parameter, meaning that it can be changed by client on a fly to better suit his data requirements (compare with InputQuantEn).
Supported values:
  • "DG_UINT8" - 8-bit unsigned integer
  • "DG_FLT" - 32-bit floating point
  • "DG_INT16" - 16-bit signed integer
InputImgFmt string "JPEG" no Image
[For inputs of image type] defines the image format.
Supported values:
  • "JPEG": input is a JPEG file
  • "RAW": input is raw binary tensor. Its data type is defined by InputRawDataType
InputImgRotation int 0 no Image
[For inputs of image type] defines input image rotation angle in degrees, clockwise.
Supported values: 0, 90, 180, and 270 degrees. Other values will be coerced to the nearest supported value.
InputColorSpace string "RGB" no Image
[For inputs of image type] defines the color space required by the model.
Supported values:
  • "RGB" - red->green->blue layout
  • "BGR" - blue->green->red layout
In case of JPEG image type, the proper conversion will be done by preprocessor. In case of raw binary image type, the raw binary tensor must be arranged accordingly by caller.
InputScaleEn bool false no Image
Defines the type of global data normalization. If true, InputScaleCoeff will be used as scale, while preparing input data: input_data = input_data ∗ scale, if false, scale = 1.
InputScaleCoeff double 1./255. no Image
Defines a scale for global data normalization.
InputNormMean float array [] no Image
[For inputs of image type] defines mean values for per-channel normalization, e.g.: "InputImgMean": [0.485,0.456,0.406]
InputNormStd float array [] no Image
[For inputs of image type] defines StDev values for per-channel normalization, e.g.: "InputImgStd": [0.229,0.224,0.225]
InputImgSliceType string "None" Image (YOLO)
[For inputs of image type] defines the slicing algorithm to use.
Supported values:
  • "None" - do not use slicing
  • "SLICE2" - implements x(b,w,h,c) -> y(b,w/2,h/2,4c) slicing algorithm. The procedure related to Focus Layer implementation of YOLO family models. See SpaceToDepth module description in TResNet: High Performance GPU-Dedicated Architecture section 3.2.1.
InputWaveformSize int 15600 No Sound
Input frame size in samples for input audio
InputSamplingRate double 16000 No Sound
Input audio sampling rate in Hz
InputFrameSize int 400 No Sound
Fourier Transform window size in samples for input audio. The input waveform is divided by overlapping windows of this size with the step specified by InputFrameHopStepSize parameter.
InputFrameHopStepSize int 160 No Sound
Fourier Transform window hop step size in samples for input audio.
InputMelFrequencyRange double array [] No Sound
Mel spectrogram frequency range for input audio processing. When not empty, should contain two elements: lower frequency and upper frequency in Hz.
InputResizeMethod string "bilinear" No Image
Interpolation algorithm for image resizing.
Supported values:
  • "nearest"
  • "bilinear"
  • "area"
  • "bicubic"
  • "lanczos"
InputPadMethod string "letterbox" No Image
How input image will be padded or cropped when resized.
Supported values:
  • "stretch"
  • "letterbox"
  • "crop-first"
  • "crop-last"
InputCropPercentage double 1.0 No Image
Percentage value for cropping image with "crop-first" or "crop-last" InputPadMethod.
ImageBackend string "auto" No Image
Python package to be used for image processing.
Supported values:
  • "auto" - tries pil first
  • "pil"
  • "opencv

Postprocessing Parameters

Section "POST_PROCESS"

Parameter NameTypeDefaultMandatoryModels
OutputPostprocessType string "None" no All
The type of output post-processing algorithm.
Supported values:
  • "Classification" for Image or Sound Classification
  • "Detection" for Image Detection
  • "FaceDetection" for Image Detection
  • "DetectionYolo" for Image Detection
  • "DetectionYoloPlates" for Image Detection
  • "DetectionYoloV8" for Image Detection
  • "DetectionYoloV10" for Image Detection
  • "PoseDetection" for Pose Detection
  • "HandDetection" for Pose Detection
  • "Segmentation" for Image Semantic Segmentation
  • "None" (pass-through post processor)
PostProcessorInputs int array [] yes All
Specifies the output tensors the postprocessor operates on, in an order declared by the implementation of any postprocessor. See "Post Processor Inputs" section below.
PythonFile string - no All
The name of Python file containing Python post processor code. This post-processor runs server-side. Developing such post-processor is advanced topic, not covered in this document.
OutputNumClasses int - no Image Detection
Number of output classes for certain detection models.
OutputSoftmaxEn bool false no Image or Sound Classification
Enables/disables softmax in postprocessing.
OutputClassIDAdjustment int 0 no Image Classification or Detection
Adjust the index of the first non-background class.
OutputConfThreshold double 0.1 no Image or Sound Classification, Image or Pose Detection
Filters out all the results below the threshold
OutputNMSThreshold double 0.6 no Image Detection
A threshold for Non-Max Suppression algorithm
OutputTopK size_t 0 no Image or Sound Classification
Number of classes to include in classification result. If zero - report all classes above OutputConfThreshold.
PoseThreshold double 0.8 no Pose Detection
Pose score threshold to filter whole pose
MaxDetections int 20 no Image or Pose Detection
Maximum number of total object detection results to report
MaxDetectionsPerClass int 100 no Image Detection
Maximum number of per-class object detection results to report
MaxClassesPerDetection int 30 no Image Detection
Maximum number of classes to report
UseRegularNMS bool true no Image Detection
Use regular NMS algorithm for object detection flag
NMSRadius double 10 no Pose Detection
A keypoint candidate is rejected if it is within NMSRadius pixels from the corresponding part of a previously detected instance.
XScale double 1 conditional Image Detection
X scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks.
YScale double 1 conditional Image Detection
Y scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks.
HScale double 1 conditional Image Detection
Height scale coefficient to convert box size coordinates to anchor-based coordinate system.
WScale double 1 conditional Image Detection
Width scale coefficient to convert box size coordinates to anchor-based coordinate system.
Stride int 16 no Pose Detection
Stride scale coefficient for pose detection
LabelsPath string "" no Image or Sound Classification, Image Detection
Path to label dictionary file

Post Processor Inputs

The "PostProcessorInputs" parameter specifies the model output tensor indexes the postprocessor operates on, in the order declared by the implementation of a postprocessor. For N2X and TensorFlow Lite runtime agents, each element of the "PostProcessorInputs" array should contain a numeric ID of an output in the model file that consists of described data. For OpenVINO, Tensor RT or ONNX runtime agents, each element of the "PostProcessorInputs" array should contain the ordinal of an input/output in the model file that consists of described data (the first input listed in Netron has ordinal 0, the second input or output, ordinal 1, et cetera).

Example - if in the documentation for some postprocessor the following input tensor order is given: - 0 - probabilities - 1 - bounding boxes

then, the resulting line in the model's JSON file should look like:

"PostProcessorInputs": [<probabilities tensor id>, <bounding boxes tensor id>],

The requirements of various post-processors are outlined in the table below:

Post-processing Type "PostProcessorInputs" Value
Classification 0 - class probabilities
Segmentation 0 - pixel class matrix
Detection 0 - anchors
1 - box regressors
2 - class probabilities
DetectionYolo,
DetectionYoloPlates
0 - probabilities/box regressors tensor (main model output)
1 - xy_mul_concat tensor
2 - grid_concat tensor
3 - anchor_grid_concat tensor
DetectionYoloV8 0 - box regressors tensor [1, 6400, 64]
1 - box regressors tensor [1, 1600, 64]
2 - box regressors tensor [1, 400, 64]
3 - probabilities regressors tensor [1, 6400, number of classes]
4 - probabilities regressors tensor [1, 1600, number of classes]
5 - probabilities regressors tensor [1, 400, number of classes]
FaceDetection 0 - box regressors
1 - face probabilities
HandDetection 0 - Identity - frame coordinates
1 - Identity_1 - score for whole hand
2 - Identity_2 - handedness score [0..1] -> [left..right]
3 - Identity_3 - world metric coordinates
PoseDetection 0 - heatmaps
1 - shorts
2 - mids