LogoLogo
AI HubCommunityWebsite
  • Start Here
  • AI Hub
    • Overview
    • Quickstart
    • Workspaces
    • Device Farm
    • Browser Inference
    • Model Zoo
      • Hailo
      • Intel
      • MemryX
      • BrainChip
      • Google
      • DeGirum
      • Rockchip
    • View and Create Model Zoos
    • Cloud Compiler
    • PySDK Integration
  • PySDK
    • Overview
    • Quickstart
    • Installation
    • Runtimes and Drivers
      • Hailo
      • OpenVINO
      • MemryX
      • BrainChip
      • Rockchip
      • ONNX
    • PySDK User Guide
      • Core Concepts
      • Organizing Models
      • Setting Up an AI Server
      • Loading an AI Model
      • Running AI Model Inference
      • Model JSON Structure
      • Command Line Interface
      • API Reference Guide
        • PySDK Package
        • Model Module
        • Zoo Manager Module
        • Postprocessor Module
        • AI Server Module
        • Miscellaneous Modules
      • Older PySDK User Guides
        • PySDK 0.16.1
        • PySDK 0.16.0
        • PySDK 0.15.2
        • PySDK 0.15.1
        • PySDK 0.15.0
        • PySDK 0.14.3
        • PySDK 0.14.2
        • PySDK 0.14.1
        • PySDK 0.14.0
        • PySDK 0.13.4
        • PySDK 0.13.3
        • PySDK 0.13.2
        • PySDK 0.13.1
        • PySDK 0.13.0
    • Release Notes
      • Retired Versions
    • EULA
  • DeGirum Tools
    • Overview
      • Streams
        • Streams Base
        • Streams Gizmos
      • Compound Models
      • Analyzers
        • Clip Saver
        • Event Detector
        • Line Count
        • Notifier
        • Object Selector
        • Object Tracker
        • Zone Count
      • Inference Support
      • Support Modules
        • Audio Support
        • Model Evaluation Support
        • Math Support
        • Object Storage Support
        • UI Support
        • Video Support
      • Environment Variables
  • DeGirumJS
    • Overview
    • Get Started
    • Understanding Results
    • Release Notes
    • API Reference Guides
      • DeGirumJS 0.1.3
      • DeGirumJS 0.1.2
      • DeGirumJS 0.1.1
      • DeGirumJS 0.1.0
      • DeGirumJS 0.0.9
      • DeGirumJS 0.0.8
      • DeGirumJS 0.0.7
      • DeGirumJS 0.0.6
      • DeGirumJS 0.0.5
      • DeGirumJS 0.0.4
      • DeGirumJS 0.0.3
      • DeGirumJS 0.0.2
      • DeGirumJS 0.0.1
  • Orca
    • Overview
    • Benchmarks
    • Unboxing and Installation
    • M.2 Setup
    • USB Setup
    • Thermal Management
    • Tools
  • Resources
    • External Links
Powered by GitBook

Get Started

  • AI Hub Quickstart
  • PySDK Quickstart
  • PySDK in Colab

Resources

  • AI Hub
  • Community
  • DeGirum Website

Social

  • LinkedIn
  • YouTube

Legal

  • PySDK EULA
  • Terms of Service
  • Privacy Policy

© 2025 DeGirum Corp.

On this page
  • Inference Result Types
  • Detection Result
  • Detection with Instance Segmentation
  • Pose Detection Result
  • Classification Result
  • Multi-Label Classification Result
  • Segmentation (Semantic) Result

Was this helpful?

  1. DeGirumJS

Understanding Results

This page explains the structure and content of inference result objects returned by the AIServerModel and CloudServerModel classes.

The AIServerModel and CloudServerModel classes return a result object that contains the inference results from the predict and predict_batch functions.

Example:

let someResult = await someModel.predict(image);
console.log(someResult);

For example, the result can be structured like this:

{
  "result": [
    [
      { "category_id": 1, "label": "foo", "score": 0.2 },
      { "category_id": 0, "label": "bar", "score": 0.1 }
    ],
    "frame123"
  ],
  "imageFrame": imageBitmap
}

Accessing the Result Data

  • Inference Results: Access the main results with someResult.result[0].

  • Frame Info / Number: Get the frame information or frame number using someResult.result[1].

  • Original Input Image: The original input image is someResult.imageFrame.

In general, all results are wrapped in this response envelope:

{
  "result": [
    [ /* … array of one or more "items" like those above … */ ],
    "someFrameIdOrTimestamp"
  ],
  "imageFrame": { /* original input image (ImageBitmap / HTMLCanvasElement, etc.) */ }
}

Inference Result Types

The inference results can be one of the following types:

  • Detection

  • Detection with Instance Segmentation

  • Pose Detection

  • Classification

  • Multi-Label Classification

  • Segmentation (Semantic)

Detection Result

Detection results include bounding boxes (bbox) along with category_id, label, and confidence score for each detected object:

{
  "result": [
    [
      {
        "bbox": [101.98, 77.67, 175.04, 232.99],
        "category_id": 0,
        "label": "face",
        "score": 0.856
      },
      {
        "bbox": [314.91, 52.55, 397.32, 228.70],
        "category_id": 0,
        "label": "face",
        "score": 0.844
      }
    ],
    "frame_15897"
  ],
  "imageFrame": {}
}
  • bbox: Array of four coordinates [x0, y0, x1, y1] for each detected object's bounding box.

  • category_id: Numerical ID of the detected category (used to look up a color in colorLookupTable).

  • label: Textual name of the detected category.

  • score: Confidence score (0.0–1.0).

Detection with Instance Segmentation

When the model returns a mask for each detected object, each detection object will include:

  • A mask field containing RLE‐encoded (or plain) pixel data

  • Optionally x_min, y_min if it's a partial mask

  • width/height if it's a full‐image mask

{
  "result": [
    [
      {
        "bbox": [50.0, 30.0, 200.0, 180.0],
        "category_id": 2,
        "label": "cat",
        "score": 0.92,
        "mask": {
          "width": 256,
          "height": 256,
          "data": "h4sIAAAAAAAC/2NgYGBgBGJEYmBgYIQAhi8GJmYmCGZkYmBiYDEwMDIwMDAwBAhq_dwYAAAA"
        }
      },
      {
        "bbox": [120.0, 60.0, 300.0, 240.0],
        "category_id": 3,
        "label": "dog",
        "score": 0.88,
        "mask": {
          // Example of a *partial* (cropped) mask within the bounding box:
          "x_min": 120,
          "y_min": 60,
          "width": 100,
          "height": 80,
          "data": "eJztwTEBAAAAgiD/r25IQAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        }
      }
    ],
    "frame_20012"
  ],
  "imageFrame": {}
}
  • mask.width, mask.height: Dimensions of the stored mask buffer (full‐image resolution for "full" masks; cropped resolution for "partial" masks).

  • mask.x_min, mask.y_min (optional): Top‐left corner in original model‐input pixel coordinates—only present when the mask is a cropped patch.

  • mask.data: A Base64‐encoded RLE (run‐length‐encoded) or raw‐bitstring array.

Pose Detection Result

Pose‐detection results include a landmarks array. Each landmark has:

  • category_id: Joint index (e.g. 0 = Nose, 1 = LeftEye, etc.)

  • label: Text name of that joint

  • landmark: [x, y] coordinates

  • score: Confidence of that landmark

  • Optionally connect: An array of other category_id values to connect this joint to (for drawing skeleton edges)

{
  "result": [
    [
      {
        "landmarks": [
          {
            "category_id": 0,
            "label": "Nose",
            "landmark": [93.99, 115.81],
            "score": 0.9986,
            "connect": [1, 2]
          },
          {
            "category_id": 1,
            "label": "LeftEye",
            "landmark": [110.31, 98.96],
            "score": 0.9988,
            "connect": [0, 3]
          },
          {
            "category_id": 2,
            "label": "RightEye",
            "landmark": [80.27, 98.46],
            "score": 0.9979,
            "connect": [0, 4]
          },
          {
            "category_id": 3,
            "label": "LeftEar",
            "landmark": [122.50, 94.20],
            "score": 0.9954,
            "connect": [1]
          },
          {
            "category_id": 4,
            "label": "RightEar",
            "landmark": [60.14, 95.30],
            "score": 0.9948,
            "connect": [2]
          }
          // … more joints …
        ],
        "score": 0.9663
      }
    ],
    "frame_18730"
  ],
  "imageFrame": {}
}

Classification Result

A classification result is just an array of { category_id, label, score } with no bounding boxes or masks:

{
  "result": [
    [
      { "category_id": 401, "label": "academic gown, academic robe, judge's robe", "score": 0.8438 },
      { "category_id": 618, "label": "lab coat, laboratory coat", "score": 0.0352 }
    ],
    "frame_19744"
  ],
  "imageFrame": {}
}
  • The model classifies the image into categories with associated confidence scores.

Multi-Label Classification Result

For multi-label classification (i.e. multiple classifiers or multiple sub-predictions), each top-level entry has a classifier name and a subarray of results with { label, score, category_id }:

{
  "result": [
    [
      {
        "classifier": "SceneClassifier",
        "results": [
          { "category_id": 0, "label": "indoor", "score": 0.95 },
          { "category_id": 1, "label": "building", "score": 0.75 },
          { "category_id": 2, "label": "kitchen", "score": 0.60 }
        ]
      },
      {
        "classifier": "ObjectPresence",
        "results": [
          { "category_id": 5, "label": "chair", "score": 0.88 },
          { "category_id": 6, "label": "table", "score": 0.55 }
        ]
      }
    ],
    "frame_20251"
  ],
  "imageFrame": {}
}
  • Each object in the top‐level array has:

    • classifier: Name of that classifier

    • results: An array of { category_id, label, score }

Segmentation (Semantic) Result

For semantic segmentation, the result object has:

  • A shape array: [1, H, W] shape of the segmentation mask.

  • A data field: a flat Uint8Array of length H × W, where each element is the class ID at that pixel.

{
  "result": [
    [
      {
        "shape": [1,513,513],
        "data": [
          0, 0, 1, 1, 2, 2,...
          0, 1, 1, 2, 2, 2,...
          0, 0, 1, 1, 1, 2,...
          0, 1, 2, 2, 0, 0,...
        ]
      }
    ],
    "frame_12345"
  ],
  "imageFrame": {}
}
  • shape = [1,513,513] 1 channel, height 513 pixels, width 513 pixels.

  • data is a flat array of length HxW, read row-major:

    • First row of 6 pixels: [0, 0, 1, 1, 2, 2...]

    • Second row of 6: [0, 1, 1, 2, 2, 2...], etc.

PreviousGet StartedNextRelease Notes

Last updated 7 days ago

Was this helpful?