Understanding Results
This page explains the structure and content of inference result objects returned by the AIServerModel and CloudServerModel classes.
The AIServerModel and CloudServerModel classes return a result object that contains the inference results from the predict
and predict_batch
functions.
Example:
let someResult = await someModel.predict(image);
console.log(someResult);
For example, the result can be structured like this:
{
"result": [
[
{ "category_id": 1, "label": "foo", "score": 0.2 },
{ "category_id": 0, "label": "bar", "score": 0.1 }
],
"frame123"
],
"imageFrame": imageBitmap
}
Accessing the Result Data
Inference Results: Access the main results with
someResult.result[0]
.Frame Info / Number: Get the frame information or frame number using
someResult.result[1]
.Original Input Image: The original input image is
someResult.imageFrame
.
In general, all results are wrapped in this response envelope:
{
"result": [
[ /* … array of one or more "items" like those above … */ ],
"someFrameIdOrTimestamp"
],
"imageFrame": { /* original input image (ImageBitmap / HTMLCanvasElement, etc.) */ }
}
Inference Result Types
The inference results can be one of the following types:
Detection
Detection with Instance Segmentation
Pose Detection
Classification
Multi-Label Classification
Segmentation (Semantic)
Detection Result
Detection results include bounding boxes (bbox
) along with category_id
, label
, and confidence score
for each detected object:
{
"result": [
[
{
"bbox": [101.98, 77.67, 175.04, 232.99],
"category_id": 0,
"label": "face",
"score": 0.856
},
{
"bbox": [314.91, 52.55, 397.32, 228.70],
"category_id": 0,
"label": "face",
"score": 0.844
}
],
"frame_15897"
],
"imageFrame": {}
}
bbox
: Array of four coordinates[x0, y0, x1, y1]
for each detected object's bounding box.category_id
: Numerical ID of the detected category (used to look up a color incolorLookupTable
).label
: Textual name of the detected category.score
: Confidence score (0.0–1.0).
Detection with Instance Segmentation
When the model returns a mask for each detected object, each detection object will include:
A
mask
field containing RLE‐encoded (or plain) pixel dataOptionally
x_min
,y_min
if it's a partial maskwidth
/height
if it's a full‐image mask
{
"result": [
[
{
"bbox": [50.0, 30.0, 200.0, 180.0],
"category_id": 2,
"label": "cat",
"score": 0.92,
"mask": {
"width": 256,
"height": 256,
"data": "h4sIAAAAAAAC/2NgYGBgBGJEYmBgYIQAhi8GJmYmCGZkYmBiYDEwMDIwMDAwBAhq_dwYAAAA"
}
},
{
"bbox": [120.0, 60.0, 300.0, 240.0],
"category_id": 3,
"label": "dog",
"score": 0.88,
"mask": {
// Example of a *partial* (cropped) mask within the bounding box:
"x_min": 120,
"y_min": 60,
"width": 100,
"height": 80,
"data": "eJztwTEBAAAAgiD/r25IQAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
}
}
],
"frame_20012"
],
"imageFrame": {}
}
mask.width
,mask.height
: Dimensions of the stored mask buffer (full‐image resolution for "full" masks; cropped resolution for "partial" masks).mask.x_min
,mask.y_min
(optional): Top‐left corner in original model‐input pixel coordinates—only present when the mask is a cropped patch.mask.data
: A Base64‐encoded RLE (run‐length‐encoded) or raw‐bitstring array.
Pose Detection Result
Pose‐detection results include a landmarks
array. Each landmark has:
category_id
: Joint index (e.g. 0 = Nose, 1 = LeftEye, etc.)label
: Text name of that jointlandmark
:[x, y]
coordinatesscore
: Confidence of that landmarkOptionally
connect
: An array of othercategory_id
values to connect this joint to (for drawing skeleton edges)
{
"result": [
[
{
"landmarks": [
{
"category_id": 0,
"label": "Nose",
"landmark": [93.99, 115.81],
"score": 0.9986,
"connect": [1, 2]
},
{
"category_id": 1,
"label": "LeftEye",
"landmark": [110.31, 98.96],
"score": 0.9988,
"connect": [0, 3]
},
{
"category_id": 2,
"label": "RightEye",
"landmark": [80.27, 98.46],
"score": 0.9979,
"connect": [0, 4]
},
{
"category_id": 3,
"label": "LeftEar",
"landmark": [122.50, 94.20],
"score": 0.9954,
"connect": [1]
},
{
"category_id": 4,
"label": "RightEar",
"landmark": [60.14, 95.30],
"score": 0.9948,
"connect": [2]
}
// … more joints …
],
"score": 0.9663
}
],
"frame_18730"
],
"imageFrame": {}
}
Classification Result
A classification result is just an array of { category_id, label, score }
with no bounding boxes or masks:
{
"result": [
[
{ "category_id": 401, "label": "academic gown, academic robe, judge's robe", "score": 0.8438 },
{ "category_id": 618, "label": "lab coat, laboratory coat", "score": 0.0352 }
],
"frame_19744"
],
"imageFrame": {}
}
The model classifies the image into categories with associated confidence scores.
Multi-Label Classification Result
For multi-label classification (i.e. multiple classifiers or multiple sub-predictions), each top-level entry has a classifier
name and a subarray of results
with { label, score, category_id }
:
{
"result": [
[
{
"classifier": "SceneClassifier",
"results": [
{ "category_id": 0, "label": "indoor", "score": 0.95 },
{ "category_id": 1, "label": "building", "score": 0.75 },
{ "category_id": 2, "label": "kitchen", "score": 0.60 }
]
},
{
"classifier": "ObjectPresence",
"results": [
{ "category_id": 5, "label": "chair", "score": 0.88 },
{ "category_id": 6, "label": "table", "score": 0.55 }
]
}
],
"frame_20251"
],
"imageFrame": {}
}
Each object in the top‐level array has:
classifier
: Name of that classifierresults
: An array of{ category_id, label, score }
Segmentation (Semantic) Result
For semantic segmentation, the result object has:
A
shape
array:[1, H, W]
shape of the segmentation mask.A
data
field: a flatUint8Array
of lengthH × W
, where each element is the class ID at that pixel.
{
"result": [
[
{
"shape": [1,513,513],
"data": [
0, 0, 1, 1, 2, 2,...
0, 1, 1, 2, 2, 2,...
0, 0, 1, 1, 1, 2,...
0, 1, 2, 2, 0, 0,...
]
}
],
"frame_12345"
],
"imageFrame": {}
}
shape = [1,513,513]
1 channel, height 513 pixels, width 513 pixels.data
is a flat array of lengthHxW
, read row-major:First row of 6 pixels:
[0, 0, 1, 1, 2, 2...]
Second row of 6:
[0, 1, 1, 2, 2, 2...]
, etc.
Last updated
Was this helpful?