Model Configuration JSON File Parameters

Parameter Table

The Model Parameter table below summarizes

PySDK supports models of the following categories:

Image Object Detection
Image Classification
Image Semantic Segmentation
Image Pose Detection
Sound Classification

The Models column in the table below contains the model category to which the corresponding parameter applies.

Parameter Name	Type	Default	Mandatory	Models	Description
General Parameters: on top level, outside of any section
ConfigVersion	int	0	yes	All	Version of JSON configuration file. This version is checked against minimum compatible and current framework software versions. If it is not within that range, version check runtime exception is generated on the model loading.
Checksum	string	""	yes	All	Checksum of model binary file
Target Device Parameters: section "DEVICE"
DeviceType	string	"CPU"	no	All	This field defines on which device the inference should be executed. Supported values: "ORCA" - run on Orca "CPU" - run on CPU "EDGETPU" - run on Google EdgeTPU "GPU" - run on GPU via Nvidia Tensor RT "DLA_ONLY" - run on DLA via Nvidia Tensor RT. No GPU use is allowed, presence of any layers not supported by DLA will cause just in time compilation fail. "DLA_FALLBACK" - run on DLA via Nvidia Tensor RT. Unsupported layers will be executed on GPU. "NPU" - run on NPU via OpenVino. "RK####" run on Rockchip NPU systems.
RuntimeAgent	string	"Default"	no	All	Defines the runtime agent to use. Supported values: "N2X" - N2X runtime agent "TFLITE" - TensorFlow Lite runtime agent "OPENVINO" - OpenVINO runtime agent "ONNX" - ONNX runtime agent "RKNN" - RKNN runtime agent "TENSORRT" - Tensor RT runtime agent
SupportedDeviceTypes	string	""	no	All	Comma-separated list of runtime agent/device type combinations, supported by the model. For example: "OPENVINO/CPU,ONNX/CPU"
Model Parameters: section "MODEL_PARAMETERS"
ModelPath	string	-	no	All	A path to a model file
CalibrationFilePath	string	-	no	All	Path to coefficient calibration file required when using Tensor RT models with post-training quantization.
UseFloat16	bool	false	no	All	Enables the use of the special Float16 type in Tensor RT.
CompilerOptions	JSON	{}	no	All	Model compiler options, keyed by runtime agent type. For example: { "N2X/CPU": "--device SW", "N2X/ORCA1": "--device HW" }
Preprocessing Parameters: section "PRE_PROCESS". This section may have more than one element. For multi-input networks each element describes one input tensor of such network.
InputType	string	"Image"	no	All	The model input kind: image, sound, etc. Supported values: "Image": input is an image with dimensions InputW x InputH x InputC "Tensor": input is raw binary tensor with dimensions InputN x InputW x InputH x InputC "Audio": input is an array with InputWaveformSize elements. Note: order of dimensions is defined by InputTensorLayout parameter.
InputN	int	-	yes	All	Input data tensor batch size.
InputH	int	-	yes	All	Input data tensor height.
InputW	int	-	yes	All	Input data tensor width.
InputC	int	-	yes	All	Input data tensor number of channels.
InputTensorLayout	string	"NHWC"	no	Image	[For inputs of raw image type and raw tensor type] defines the dimensional layout of raw binary tensor. Supported values: "auto" - deduce tensor layout automatically "NHWC" - N->Height->Width->Color layout "NCHW" - N->Color->Height->Width layout
InputQuantEn	bool	false	no	All	[For inputs of image type and raw tensor type] enables input quantization. Applies for images and raw tensors. This parameter defines actual model input requirement: is it uint8 or flt32; it is not a runtime parameter, so it cannot be changed on the fly (compare with InputRawDataType). When ModelQuantEn is true (quantization is enabled), input data is converted by pre-processor to uint8 data type, otherwise it is converted to flt32 data type.
InputQuantOffset	float	0	no	All	[For inputs of image type and raw tensor type] defines the image quantization zero offset (see InputQuantScale for quantization formula)
InputQuantScale	float	1	no	All	[For inputs of image type and raw tensor type] defines the image quantization scale. When model quantization is enabled (InputQuantEn is true), then the data going to the model input will be scaled before the quantization by applying the formula: out = quantize( in / InputQuantScale ) + InputQuantOffset
InputRawDataType	string	"DG_UINT8"	no	All	[For inputs of raw image type and raw tensor type] defines the data type of raw binary tensor element how the pre-processor will treat client data. It is runtime parameter, meaning that it can be changed by client on a fly to better suit his data requirements (compare with InputQuantEn). Supported values: "DG_UINT8" - 8-bit unsigned integer "DG_FLT" - 32-bit floating point "DG_INT16" - 16-bit signed integer
InputImgFmt	string	"JPEG"	no	Image	[For inputs of image type] defines the image format. Supported values: "JPEG": input is a JPEG file; "RAW": input is raw binary tensor. Its data type is defined by InputRawDataType
InputImgRotation	int	0	no	Image	[For inputs of image type] defines input image rotation angle in degrees, clockwise. Supported values: 0, 90, 180, and 270 degrees. Other values will be coerced to the nearest supported value.
InputColorSpace	string	"RGB"	no	Image	[For inputs of image type] defines the color space required by the model. Supported values: "RGB" - red->green->blue layout "BGR" - blue->green->red layout In case of JPEG image type, the proper conversion will be done by preprocessor. In case of raw binary image type, the raw binary tensor must be arranged accordingly by caller.
InputScaleEn	bool	false	no	Image	Defines the type of global data normalization. If true, InputScaleCoeff will be used as scale, while preparing input data: input_data = input_data ∗ scale, if false, scale = 1.
InputScaleCoeff	double	1./255.	no	Image	Defines a scale for global data normalization.
InputNormMean	float array	[]	no	Image	[For inputs of image type] defines mean values for per-channel normalization, e.g. : "InputImgMean" : [0.485,0.456,0.406]
InputNormStd	float array	[]	no	Image	[For inputs of image type] defines StDev values for per-channel normalization, e.g. : "InputImgStd" : [0.229,0.224,0.225]
InputImgSliceType	string	"None"		Image (YOLO)	[For inputs of image type] defines the slicing algorithm to use. Supported values: "None" - do not use slicing "SLICE2" - implements x(b,w,h,c) -> y(b,w/2,h/2,4c) slicing algorithm. The procedure related to Focus Layer implementation of YOLO family models. See SpaceToDepth module description in TResNet: High Performance GPU-Dedicated Architecture section 3.2.1.
InputWaveformSize	int	15600	No	Sound	Input frame size in samples for input audio
InputSamplingRate	double	16000	No	Sound	Input audio sampling rate in Hz
InputFrameSize	int	400	No	Sound	Fourier Transform window size in samples for input audio. The input waveform is divided by overlapping windows of this size with the step specified by InputFrameHopStepSize parameter.
InputFrameHopStepSize	int	160	No	Sound	Fourier Transform window hop step size in samples for input audio.
InputMelFrequencyRange	double array	[]	No	Sound	Mel spectrogram frequency range for input audio processing. When not empty, should contain two elements: lower frequency and upper frequency in Hz.
InputResizeMethod	string	"bilinear"	No	Image	Interpolation algorithm for image resizing. Supported values: "nearest" "bilinear" "area" "bicubic" "lanczos"
InputPadMethod	string	"letterbox"	No	Image	How input image will be padded or cropped when resized. Supported values: "stretch" "letterbox" "crop-first" "crop-last"
InputCropPercentage	double	1.0	No	Image	Percentage value for cropping image with "crop-first" or "crop-last" InputPadMethod
ImageBackend	string	"auto"	No	Image	Python package to be used for image processing. Supported values: "auto" - tries pil first "pil" "opencv
Postprocessing Parameters: section "POST_PROCESS"
OutputPostprocessType	string	"None"	no	All	The following options can be set: "Classification" for Image or Sound Classification "Detection" for Image Detection "FaceDetection" for Image Detection "DetectionYolo" for Image Detection "DetectionYoloPlates" for Image Detection "DetectionYoloV8" for Image Detection "DetectionYoloV10" for Image Detection "PoseDetection" for Pose Detection "HandDetection" for Pose Detection "Segmentation" for Image Semantic Segmentation "None" (pass-through post processor) "Python" (Python post processor)
PostProcessorInputs	int array	[]	yes	All	Specifies the output tensors the postprocessor operates on, in an order declared by the implementation of any postprocessor. See "Post Processor Inputs", below.
PythonFile	string	-	no	All	The name of Python file containing Python post processor code. Must be specified when OutputPostprocessType is set to "Python"
OutputNumClasses	int	-	no	Image Detection	Number of output classes
OutputSoftmaxEn	bool	false	no	Image or Sound Classification	Enables/disables softmax in postprocessing.
OutputClassIDAdjustment	int	0	no	Image Classification or Detection	Adjust the index of the first non-background class.
OutputConfThreshold	double	0.1	no	Image or Sound Classification, Image or Pose Detection	Filters out all the results below the threshold
OutputNMSThreshold	double	0.6	no	Image Detection	A threshold for Non-Max Suppression algorithm
OutputTopK	size_t	0	no	Image or Sound Classification	Number of classes to include in classification result. If zero - report all classes above OutputConfThreshold.
PoseThreshold	double	0.8	no	Pose Detection	Pose score threshold to filter whole pose
MaxDetections	int	20	no	Image or Pose Detection	Maximum number of total object detection results to report
MaxDetectionsPerClass	int	100	no	Image Detection	Maximum number of per-class object detection results to report
MaxClassesPerDetection	int	30	no	Image Detection	Maximum number of classes to report
UseRegularNMS	bool	true	no	Image Detection	Use regular NMS algorithm for object detection flag
NMSRadius	double	10	no	Pose Detection	A keypoint candidate is rejected if it is within NMSRadius pixels from the corresponding part of a previously detected instance.
XScale	double	1	conditional	Image Detection	X scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks.
YScale	double	1	conditional	Image Detection	Y scale coefficient to convert box center coordinates to anchor-based coordinate system. Mandatory for object detection networks.
HScale	double	1	conditional	Image Detection	Height scale coefficient to convert box size coordinates to anchor-based coordinate system.
WScale	double	1	conditional	Image Detection	Width scale coefficient to convert box size coordinates to anchor-based coordinate system.
Stride	int	16	no	Pose Detection	Stride scale coefficient for pose detection
LabelsPath	string	""	no	Image or Sound Classification, Image Detection	Path to label dictionary file

Post Processor Inputs

The "PostProcessorInputs" parameter specifies the model output tensors the postprocessor operates on, in the order declared by the implementation of a postprocessor. For N2X and TensorFlow Lite runtime agents, each element of the "PostProcessorInputs" array should contain a numeric ID of an output in the model file that cosists of described data. For OpenVINO, Tensor RT or ONNX runtime agents, each element of the "PostProcessorInputs" array should contain the ordinal of an input/output in the model file that cosists of described data (the first input listed in Netron has ordinal 0, the second input or output, ordinal 1, et cetera).

Example - if in the documentation for some postprocessor the following input tensor order is given: - 0 - probabilities - 1 - bounding boxes

the resulting line in the model's JSON file should look like:

"PostProcessorInputs": [probabilities_tensor_id, boxes_tensor_id],

The requirements of various post-processors are outlined below.

Classification

0 - class probabilities

Segmentation

0 - pixel class matrix

Detection

0 - anchors
1 - box regressors
2 - class probabilities

DetectionYolo

0 - probabilities/box regressors tensor (main model output)
1 - xy_mul_concat tensor
2 - grid_concat tensor
3 - anchor_grid_concat tensor

DetectionYoloPlates

This post processor uses the same scheme as DetectionYolo.

DetectionYoloV8

0 - box regressors tensor [1, 6400, 64]
1 - box regressors tensor [1, 1600, 64]
2 - box regressors tensor [1, 400, 64]
3 - probabilities regressors tensor [1, 6400, number of classes]
4 - probabilities regressors tensor [1, 1600, number of classes]
5 - probabilities regressors tensor [1, 400, number of classes]

FaceDetection

0 - box regressors
1 - face probabilities

HandDetection

0 - Identity - frame coordinates
1 - Identity_1 - score for whole hand
2 - Identity_2 - handedness score [0..1] -> [left..right]
3 - Identity_3 - world metric coordinates

PoseDetection

0 - heatmaps
1 - shorts
2 - mids