Understanding Results

This page explains the structure and content of inference result objects returned by the AIServerModel and CloudServerModel classes.

The AIServerModel and CloudServerModel classes return a result object that contains the inference results from the predict and predict_batch functions.

Example:

let someResult = await someModel.predict(image);
console.log(someResult);

For example, the result can be structured like this:

{
  "result": [
    [
      { "category_id": 1, "label": "foo", "score": 0.2 },
      { "category_id": 0, "label": "bar", "score": 0.1 }
    ],
    "frame123"
  ],
  "imageFrame": imageBitmap
}

Accessing the Result Data

  • Inference Results: Access the main results with someResult.result[0].

  • Frame Info / Number: Get the frame information or frame number using someResult.result[1].

  • Original Input Image: The original input image is someResult.imageFrame.

In general, all results are wrapped in this response envelope:

{
  "result": [
    [ /* … array of one or more "items" like those above … */ ],
    "someFrameIdOrTimestamp"
  ],
  "imageFrame": { /* original input image (ImageBitmap / HTMLCanvasElement, etc.) */ }
}

Inference Result Types

The inference results can be one of the following types:

  • Detection

  • Detection with Instance Segmentation

  • Pose Detection

  • Classification

  • Multi-Label Classification

  • Segmentation (Semantic)

Detection Result

Detection results include bounding boxes (bbox) along with category_id, label, and confidence score for each detected object:

{
  "result": [
    [
      {
        "bbox": [101.98, 77.67, 175.04, 232.99],
        "category_id": 0,
        "label": "face",
        "score": 0.856
      },
      {
        "bbox": [314.91, 52.55, 397.32, 228.70],
        "category_id": 0,
        "label": "face",
        "score": 0.844
      }
    ],
    "frame_15897"
  ],
  "imageFrame": {}
}
  • bbox: Array of four coordinates [x0, y0, x1, y1] for each detected object's bounding box.

  • category_id: Numerical ID of the detected category (used to look up a color in colorLookupTable).

  • label: Textual name of the detected category.

  • score: Confidence score (0.0–1.0).

Detection with Instance Segmentation

When the model returns a mask for each detected object, each detection object will include:

  • A mask field containing RLE‐encoded (or plain) pixel data

  • Optionally x_min, y_min if it's a partial mask

  • width/height if it's a full‐image mask

{
  "result": [
    [
      {
        "bbox": [50.0, 30.0, 200.0, 180.0],
        "category_id": 2,
        "label": "cat",
        "score": 0.92,
        "mask": {
          "width": 256,
          "height": 256,
          "data": "h4sIAAAAAAAC/2NgYGBgBGJEYmBgYIQAhi8GJmYmCGZkYmBiYDEwMDIwMDAwBAhq_dwYAAAA"
        }
      },
      {
        "bbox": [120.0, 60.0, 300.0, 240.0],
        "category_id": 3,
        "label": "dog",
        "score": 0.88,
        "mask": {
          // Example of a *partial* (cropped) mask within the bounding box:
          "x_min": 120,
          "y_min": 60,
          "width": 100,
          "height": 80,
          "data": "eJztwTEBAAAAgiD/r25IQAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        }
      }
    ],
    "frame_20012"
  ],
  "imageFrame": {}
}
  • mask.width, mask.height: Dimensions of the stored mask buffer (full‐image resolution for "full" masks; cropped resolution for "partial" masks).

  • mask.x_min, mask.y_min (optional): Top‐left corner in original model‐input pixel coordinates—only present when the mask is a cropped patch.

  • mask.data: A Base64‐encoded RLE (run‐length‐encoded) or raw‐bitstring array.

Pose Detection Result

Pose‐detection results include a landmarks array. Each landmark has:

  • category_id: Joint index (e.g. 0 = Nose, 1 = LeftEye, etc.)

  • label: Text name of that joint

  • landmark: [x, y] coordinates

  • score: Confidence of that landmark

  • Optionally connect: An array of other category_id values to connect this joint to (for drawing skeleton edges)

{
  "result": [
    [
      {
        "landmarks": [
          {
            "category_id": 0,
            "label": "Nose",
            "landmark": [93.99, 115.81],
            "score": 0.9986,
            "connect": [1, 2]
          },
          {
            "category_id": 1,
            "label": "LeftEye",
            "landmark": [110.31, 98.96],
            "score": 0.9988,
            "connect": [0, 3]
          },
          {
            "category_id": 2,
            "label": "RightEye",
            "landmark": [80.27, 98.46],
            "score": 0.9979,
            "connect": [0, 4]
          },
          {
            "category_id": 3,
            "label": "LeftEar",
            "landmark": [122.50, 94.20],
            "score": 0.9954,
            "connect": [1]
          },
          {
            "category_id": 4,
            "label": "RightEar",
            "landmark": [60.14, 95.30],
            "score": 0.9948,
            "connect": [2]
          }
          // … more joints …
        ],
        "score": 0.9663
      }
    ],
    "frame_18730"
  ],
  "imageFrame": {}
}

Classification Result

A classification result is just an array of { category_id, label, score } with no bounding boxes or masks:

{
  "result": [
    [
      { "category_id": 401, "label": "academic gown, academic robe, judge's robe", "score": 0.8438 },
      { "category_id": 618, "label": "lab coat, laboratory coat", "score": 0.0352 }
    ],
    "frame_19744"
  ],
  "imageFrame": {}
}
  • The model classifies the image into categories with associated confidence scores.

Multi-Label Classification Result

For multi-label classification (i.e. multiple classifiers or multiple sub-predictions), each top-level entry has a classifier name and a subarray of results with { label, score, category_id }:

{
  "result": [
    [
      {
        "classifier": "SceneClassifier",
        "results": [
          { "category_id": 0, "label": "indoor", "score": 0.95 },
          { "category_id": 1, "label": "building", "score": 0.75 },
          { "category_id": 2, "label": "kitchen", "score": 0.60 }
        ]
      },
      {
        "classifier": "ObjectPresence",
        "results": [
          { "category_id": 5, "label": "chair", "score": 0.88 },
          { "category_id": 6, "label": "table", "score": 0.55 }
        ]
      }
    ],
    "frame_20251"
  ],
  "imageFrame": {}
}
  • Each object in the top‐level array has:

    • classifier: Name of that classifier

    • results: An array of { category_id, label, score }

Segmentation (Semantic) Result

For semantic segmentation, the result object has:

  • A shape array: [1, H, W] shape of the segmentation mask.

  • A data field: a flat Uint8Array of length H × W, where each element is the class ID at that pixel.

{
  "result": [
    [
      {
        "shape": [1,513,513],
        "data": [
          0, 0, 1, 1, 2, 2,...
          0, 1, 1, 2, 2, 2,...
          0, 0, 1, 1, 1, 2,...
          0, 1, 2, 2, 0, 0,...
        ]
      }
    ],
    "frame_12345"
  ],
  "imageFrame": {}
}
  • shape = [1,513,513] 1 channel, height 513 pixels, width 513 pixels.

  • data is a flat array of length HxW, read row-major:

    • First row of 6 pixels: [0, 0, 1, 1, 2, 2...]

    • Second row of 6: [0, 1, 1, 2, 2, 2...], etc.

Last updated

Was this helpful?