Understanding Results
This page explains the structure and content of inference result objects returned by the AIServerModel and CloudServerModel classes.
The AIServerModel and CloudServerModel classes return a result object that contains the inference results from the predict
and predict_batch
functions.
Example:
For example, the result can be structured like this:
Accessing the Result Data
Inference Results: Access the main results with
someResult.result[0]
.Frame Info / Number: Get the frame information or frame number using
someResult.result[1]
.Original Input Image: The original input image is
someResult.imageFrame
.
In general, all results are wrapped in this response envelope:
Inference Result Types
The inference results can be one of the following types:
Detection
Detection with Instance Segmentation
Pose Detection
Classification
Multi-Label Classification
Segmentation (Semantic)
Detection Result
Detection results include bounding boxes (bbox
) along with category_id
, label
, and confidence score
for each detected object:
bbox
: Array of four coordinates[x0, y0, x1, y1]
for each detected object's bounding box.category_id
: Numerical ID of the detected category (used to look up a color incolorLookupTable
).label
: Textual name of the detected category.score
: Confidence score (0.0–1.0).
Detection with Instance Segmentation
When the model returns a mask for each detected object, each detection object will include:
A
mask
field containing RLE‐encoded (or plain) pixel dataOptionally
x_min
,y_min
if it's a partial maskwidth
/height
if it's a full‐image mask
mask.width
,mask.height
: Dimensions of the stored mask buffer (full‐image resolution for "full" masks; cropped resolution for "partial" masks).mask.x_min
,mask.y_min
(optional): Top‐left corner in original model‐input pixel coordinates—only present when the mask is a cropped patch.mask.data
: A Base64‐encoded RLE (run‐length‐encoded) or raw‐bitstring array.
Pose Detection Result
Pose‐detection results include a landmarks
array. Each landmark has:
category_id
: Joint index (e.g. 0 = Nose, 1 = LeftEye, etc.)label
: Text name of that jointlandmark
:[x, y]
coordinatesscore
: Confidence of that landmarkOptionally
connect
: An array of othercategory_id
values to connect this joint to (for drawing skeleton edges)
Classification Result
A classification result is just an array of { category_id, label, score }
with no bounding boxes or masks:
The model classifies the image into categories with associated confidence scores.
Multi-Label Classification Result
For multi-label classification (i.e. multiple classifiers or multiple sub-predictions), each top-level entry has a classifier
name and a subarray of results
with { label, score, category_id }
:
Each object in the top‐level array has:
classifier
: Name of that classifierresults
: An array of{ category_id, label, score }
Segmentation (Semantic) Result
For semantic segmentation, the result object has:
A
shape
array:[1, H, W]
shape of the segmentation mask.A
data
field: a flatUint8Array
of lengthH × W
, where each element is the class ID at that pixel.
shape = [1,513,513]
1 channel, height 513 pixels, width 513 pixels.data
is a flat array of lengthHxW
, read row-major:First row of 6 pixels:
[0, 0, 1, 1, 2, 2...]
Second row of 6:
[0, 1, 1, 2, 2, 2...]
, etc.
Last updated
Was this helpful?