Model Configuration JSON File Parameters

Parameter Table

The Model Parameter table below summarizes

PySDK supports models of the following categories:

Image Object Detection
Image Classification
Image Semantic Segmentation
Image Pose Detection
Sound Classification

The Models column in the table below contains the model category to which the corresponding parameter applies.

General Parameters

On top level, outside of any section

Parameter Name	Type	Default	Mandatory	Models
ConfigVersion	int	0	yes	All
Version of JSON configuration file. This version is checked against minimum compatible and current framework software versions. If it is not within that range, version check runtime exception is generated on the model loading.
Checksum	string	""	yes	All
Checksum of model binary file

Target Device Parameters

Section "DEVICE"

Parameter Name	Type	Default	Mandatory	Models
DeviceType	string	"CPU"	no	All
This field defines on which device the inference should be executed. Supported values: "ORCA" - run on Orca "CPU" - run on CPU "EDGETPU" - run on Google EdgeTPU "GPU" - run on GPU via Nvidia Tensor RT "DLA_ONLY" - run on DLA via Nvidia Tensor RT. No GPU use is allowed, presence of any layers not supported by DLA will cause just in time compilation fail. "DLA_FALLBACK" - run on DLA via Nvidia Tensor RT. Unsupported layers will be executed on GPU. "NPU" - run on NPU via OpenVino. "RK####" run on Rockchip NPU systems. "VITIS_NPU" run on Ryzen NPU via ONNX.
RuntimeAgent	string	"Default"	no	All
Defines the runtime agent to use. Supported values: "N2X" - N2X runtime agent "TFLITE" - TensorFlow Lite runtime agent "OPENVINO" - OpenVINO runtime agent "ONNX" - ONNX runtime agent "RKNN" - RKNN runtime agent "TENSORRT" - Tensor RT runtime agent
SupportedDeviceTypes	string	""	no	All
Comma-separated list of runtime agent/device type combinations, supported by the model. For example: `"OPENVINO/CPU,ONNX/CPU"`

Model Parameters

Section "MODEL_PARAMETERS"

Parameter Name	Type	Default	Mandatory	Models
ModelPath	string	-	no	All
A path to a model file
CalibrationFilePath	string	-	no	All
Path to coefficient calibration file required when using Tensor RT models with post-training quantization.
UseFloat16	bool	false	no	All
Enables the use of the special Float16 type in TensorRT runtime.
CompilerOptions	JSON	{}	no	All
Model compiler options, keyed by runtime agent type. For example: `{ "N2X/CPU": "--device SW", "N2X/ORCA1": "--device HW" }`

Preprocessing Parameters

Section "PRE_PROCESS"

This section may have more than one element. For multi-input networks each element describes one input tensor of such network.

Parameter Name	Type	Default	Mandatory	Models
InputType	string	"Image"	no	All
The model input kind: image, sound, etc. Supported values: "Image": input is an image with dimensions InputW x InputH x InputC "Tensor": input is raw binary tensor with dimensions InputN x InputW x InputH x InputC "Audio": input is an array with InputWaveformSize elements. Note: order of dimensions is defined by InputTensorLayout parameter.
InputN	int	-	yes	All
Input data tensor batch size.
InputH	int	-	yes	All
Input data tensor height.
InputW	int	-	yes	All
Input data tensor width.
InputC	int	-	yes	All
Input data tensor number of channels.
InputTensorLayout	string	"NHWC"	no	Image
[For inputs of raw image type and raw tensor type] defines the dimensional layout of raw binary tensor. Supported values: "auto" - deduce tensor layout automatically "NHWC" - N->Height->Width->Color layout "NCHW" - N->Color->Height->Width layout
InputQuantEn	bool	false	no	All
[For inputs of image type and raw tensor type] enables input quantization. Applies for images and raw tensors. This parameter defines actual model input requirement: is it uint8 or flt32; it is not a runtime parameter, so it cannot be changed on the fly (compare with InputRawDataType). When ModelQuantEn is true (quantization is enabled), input data is converted by pre-processor to uint8 data type, otherwise it is converted to flt32 data type.
InputQuantOffset	float	0	no	All
[For inputs of image type and raw tensor type] defines the image quantization zero offset (see InputQuantScale for quantization formula)
InputQuantScale	float	1	no	All
[For inputs of image type and raw tensor type] defines the image quantization scale. When model quantization is enabled (InputQuantEn is true), then the data going to the model input will be scaled before the quantization by applying the formula: `out = quantize(in / InputQuantScale) + InputQuantOffset`
InputRawDataType	string	"DG_UINT8"	no	All
[For inputs of raw image type and raw tensor type] defines the data type of raw binary tensor element how the pre-processor will treat client data. It is runtime parameter, meaning that it can be changed by client on a fly to better suit his data requirements (compare with InputQuantEn). Supported values: "DG_UINT8" - 8-bit unsigned integer "DG_FLT" - 32-bit floating point "DG_INT16" - 16-bit signed integer
InputImgFmt	string	"JPEG"	no	Image
[For inputs of image type] defines the image format. Supported values: "JPEG": input is a JPEG file "RAW": input is raw binary tensor. Its data type is defined by InputRawDataType
InputImgRotation	int	0	no	Image
[For inputs of image type] defines input image rotation angle in degrees, clockwise. Supported values: 0, 90, 180, and 270 degrees. Other values will be coerced to the nearest supported value.
InputColorSpace	string	"RGB"	no	Image
[For inputs of image type] defines the color space required by the model. Supported values: "RGB" - red->green->blue layout "BGR" - blue->green->red layout In case of JPEG image type, the proper conversion will be done by preprocessor. In case of raw binary image type, the raw binary tensor must be arranged accordingly by caller.
InputScaleEn	bool	false	no	Image
Defines the type of global data normalization. If true, InputScaleCoeff will be used as scale, while preparing input data: `input_data = input_data ∗ scale, if false, scale = 1`.
InputScaleCoeff	double	1./255.	no	Image
Defines a scale for global data normalization.
InputNormMean	float array	[]	no	Image
[For inputs of image type] defines mean values for per-channel normalization, e.g.: `"InputImgMean": [0.485,0.456,0.406]`
InputNormStd	float array	[]	no	Image
[For inputs of image type] defines StDev values for per-channel normalization, e.g.: `"InputImgStd": [0.229,0.224,0.225]`
InputImgSliceType	string	"None"		Image (YOLO)
[For inputs of image type] defines the slicing algorithm to use. Supported values: "None" - do not use slicing "SLICE2" - implements x(b,w,h,c) -> y(b,w/2,h/2,4c) slicing algorithm. The procedure related to Focus Layer implementation of YOLO family models. See SpaceToDepth module description in TResNet: High Performance GPU-Dedicated Architecture section 3.2.1.
InputWaveformSize	int	15600	No	Sound
Input frame size in samples for input audio
InputSamplingRate	double	16000	No	Sound
Input audio sampling rate in Hz
InputFrameSize	int	400	No	Sound
Fourier Transform window size in samples for input audio. The input waveform is divided by overlapping windows of this size with the step specified by InputFrameHopStepSize parameter.
InputFrameHopStepSize	int	160	No	Sound
Fourier Transform window hop step size in samples for input audio.
InputMelFrequencyRange	double array	[]	No	Sound
Mel spectrogram frequency range for input audio processing. When not empty, should contain two elements: lower frequency and upper frequency in Hz.
InputResizeMethod	string	"bilinear"	No	Image
Interpolation algorithm for image resizing. Supported values: "nearest" "bilinear" "area" "bicubic" "lanczos"
InputPadMethod	string	"letterbox"	No	Image
How input image will be padded or cropped when resized. Supported values: "stretch" "letterbox" "crop-first" "crop-last"
InputCropPercentage	double	1.0	No	Image
Percentage value for cropping image with "crop-first" or "crop-last" InputPadMethod.
ImageBackend	string	"auto"	No	Image
Python package to be used for image processing. Supported values: "auto" - tries pil first "pil" "opencv

Postprocessing Parameters

Section "POST_PROCESS"

Parameter Name	Type	Default	Mandatory	Models
OutputPostprocessType	string	"None"	no	All
The type of output post-processing algorithm. Supported values: "Classification" for Image or Sound Classification "Detection" for Image Detection "FaceDetection" for Image Detection "DetectionYolo" for Image Detection "DetectionYoloPlates" for Image Detection "DetectionYoloV8" for Image Detection "DetectionYoloV10" for Image Detection "PoseDetection" for Pose Detection "HandDetection" for Pose Detection "Segmentation" for Image Semantic Segmentation "None" (pass-through post processor)
PostProcessorInputs	int array	[]	yes	All
Specifies the output tensors the postprocessor operates on, in an order declared by the implementation of any postprocessor. See "Post Processor Inputs" section below.
PythonFile	string	-	no	All
The name of Python file containing Python post processor code. This post-processor runs server-side. Developing such post-processor is advanced topic, not covered in this document.
OutputNumClasses	int	-	no	Image Detection
Number of output classes for certain detection models.
OutputSoftmaxEn	bool	false	no	Image or Sound Classification
Enables/disables softmax in postprocessing.
OutputClassIDAdjustment	int	0	no	Image Classification or Detection
Adjust the index of the first non-background class.
OutputConfThreshold	double	0.1	no	Image or Sound Classification, Image or Pose Detection
Filters out all the results below the threshold
OutputNMSThreshold	double	0.6	no	Image Detection
A threshold for Non-Max Suppression algorithm
OutputTopK	size_t	0	no	Image or Sound Classification
Number of classes to include in classification result. If zero - report all classes above OutputConfThreshold.
PoseThreshold	double	0.8	no	Pose Detection
Pose score threshold to filter whole pose
MaxDetections	int	20	no	Image or Pose Detection
Maximum number of total object detection results to report
MaxDetectionsPerClass	int	100	no	Image Detection
Maximum number of per-class object detection results to report
MaxClassesPerDetection	int	30	no	Image Detection
Maximum number of classes to report
UseRegularNMS	bool	true	no	Image Detection
Use regular NMS algorithm for object detection flag
NMSRadius	double	10	no	Pose Detection
A keypoint candidate is rejected if it is within NMSRadius pixels from the corresponding part of a previously detected instance.
XScale	double	1	conditional	Image Detection
X scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks.
YScale	double	1	conditional	Image Detection
Y scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks.
HScale	double	1	conditional	Image Detection
Height scale coefficient to convert box size coordinates to anchor-based coordinate system.
WScale	double	1	conditional	Image Detection
Width scale coefficient to convert box size coordinates to anchor-based coordinate system.
Stride	int	16	no	Pose Detection
Stride scale coefficient for pose detection
LabelsPath	string	""	no	Image or Sound Classification, Image Detection
Path to label dictionary file

Post Processor Inputs

The "PostProcessorInputs" parameter specifies the model output tensor indexes the postprocessor operates on, in the order declared by the implementation of a postprocessor. For N2X and TensorFlow Lite runtime agents, each element of the "PostProcessorInputs" array should contain a numeric ID of an output in the model file that consists of described data. For OpenVINO, Tensor RT or ONNX runtime agents, each element of the "PostProcessorInputs" array should contain the ordinal of an input/output in the model file that consists of described data (the first input listed in Netron has ordinal 0, the second input or output, ordinal 1, et cetera).

Example - if in the documentation for some postprocessor the following input tensor order is given: - 0 - probabilities - 1 - bounding boxes

then, the resulting line in the model's JSON file should look like:

"PostProcessorInputs": [<probabilities tensor id>, <bounding boxes tensor id>],

The requirements of various post-processors are outlined in the table below:

Post-processing Type	"PostProcessorInputs" Value
Classification	0 - class probabilities
Segmentation	0 - pixel class matrix
Detection	0 - anchors 1 - box regressors 2 - class probabilities
DetectionYolo, DetectionYoloPlates	0 - probabilities/box regressors tensor (main model output) 1 - xy_mul_concat tensor 2 - grid_concat tensor 3 - anchor_grid_concat tensor
DetectionYoloV8	0 - box regressors tensor [1, 6400, 64] 1 - box regressors tensor [1, 1600, 64] 2 - box regressors tensor [1, 400, 64] 3 - probabilities regressors tensor [1, 6400, `number of classes`] 4 - probabilities regressors tensor [1, 1600, `number of classes`] 5 - probabilities regressors tensor [1, 400, `number of classes`]
FaceDetection	0 - box regressors 1 - face probabilities
HandDetection	0 - Identity - frame coordinates 1 - Identity_1 - score for whole hand 2 - Identity_2 - handedness score [0..1] -> [left..right] 3 - Identity_3 - world metric coordinates
PoseDetection	0 - heatmaps 1 - shorts 2 - mids