Working with Input and Output Data

Guide for the various input data formats supported by DeGirumJS for inference, as well as a detailed breakdown of the output result object structures for different model types.

Input Data Formats

DeGirumJS predict() and predict_batch() methods are designed to be flexible, accepting a wide range of input image formats. Internally, the SDK uses the ImageBitmap API for efficient image processing and handles the conversion of various input types into a standardized ImageBitmap format before sending them to the model for inference.

The following input types are supported:

  • HTML Elements:

  • Image Data Objects:

  • String Formats:

    • Image URL: A standard URL pointing to an image resource. Example: https://example.com/path/to/image.jpg

    • Data URL: A string representing a Base64-encoded image, prefixed with data:. Example: 

    • Base64 String: A raw Base64-encoded string of image data (without the data: prefix).

  • Batch Processing: The predict_batch() method can accept an Async Generator that yields pairs of input data and frame identifiers. The input data must be in one of the above formats. This allows for efficient processing of multiple frames for real-time applications such as video streams or multi-frame inference. See Advanced Inference: Batch Processing & Callbacks

  • Web Codecs API The predict_batch() method can also work with ReadableStream objects. This enables efficient video processing while using the Web Codecs API for handling frames. Use this to build video processing pipelines in fewer lines of code.

Usage Examples for Input

You can pass any of the supported input types directly to the predict() or predict_batch() methods:

Output Data Structure

The predict() and predict_batch() methods of AIServerModel and CloudServerModel return a comprehensive result object. This object encapsulates the inference output from the model, along with contextual information about the processed frame.

The general structure of the returned object is as follows:

Accessing the Result Data

  • Inference Results: Access the main inference results using someResult.result[0]. This is an array of objects, where each object represents a detected item, classification, pose, or segmentation mask.

  • Frame Info / Number: Retrieve the unique identifier or frame information using someResult.result[1]. Use this to correlate results with specific input frames, especially in batch processing.

  • Original Input Image: Access the original input image as an ImageBitmap via someResult.imageFrame. Note that this will be null if the input was an HTMLVideoElement to avoid memory issues with continuous video streams.

  • Preprocessed Model Image: If the saveModelImage model parameter is set to true, the someResult.modelImage property will contain the preprocessed image as a Blob that was sent to the model. This can be useful for debugging preprocessing steps.

Inference Result Types

The structure of the objects within someResult.result[0] varies depending on the type of AI model and its output. The SDK supports the following common inference result types:

  • Detection Results

  • Classification Results

  • Pose Detection Results

  • Segmentation Results

  • Multi-Label Classification Results

For detailed examples and explanations of each result type, refer to Result Object Structure + Examples. This document provides comprehensive JSON examples and descriptions for bbox, landmarks, category_id, label, score, and mask fields.

Displaying Results on a Canvas

The displayResultToCanvas() method handles the drawing of bounding boxes, labels, keypoints, and segmentation masks based on the model's output.

Parameters:

  • combinedResult: The result object returned by predict() or predict_batch().

  • outputCanvasName: The ID of the HTML <canvas> element (as a string) or a direct reference to an HTMLCanvasElement or OffscreenCanvas object where the results will be drawn.

  • justResults (optional): A boolean flag. If true, only the inference overlay (e.g., bounding boxes, labels) will be drawn on the canvas, without drawing the original imageFrame. This is useful when you want to overlay results on an existing canvas content or when the input was a video stream, for example. Defaults to false.

Example:

Was this helpful?