Working with Input and Output Data
Guide for the various input data formats supported by DeGirumJS for inference, as well as a detailed breakdown of the output result object structures for different model types.
Input Data Formats
DeGirumJS predict() and predict_batch() methods are designed to be flexible, accepting a wide range of input image formats. Internally, the SDK uses the ImageBitmap API for efficient image processing and handles the conversion of various input types into a standardized ImageBitmap format before sending them to the model for inference.
The following input types are supported:
HTML Elements:
HTMLImageElement(<img>)SVGImageElement(<image>within SVG)HTMLVideoElement(<video>) - The current frame will be used.HTMLCanvasElement(<canvas>)
Image Data Objects:
File(specifically image files likeimage/jpeg,image/png, etc.)VideoFrame(if available in the environment)
String Formats:
Image URL: A standard URL pointing to an image resource. Example:
https://example.com/path/to/image.jpgData URL: A string representing a Base64-encoded image, prefixed with
data:. Example:data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==Base64 String: A raw Base64-encoded string of image data (without the
data:prefix).
Array Buffer Types:
Batch Processing: The
predict_batch()method can accept an Async Generator that yields pairs of input data and frame identifiers. The input data must be in one of the above formats. This allows for efficient processing of multiple frames for real-time applications such as video streams or multi-frame inference. See Advanced Inference: Batch Processing & Callbacks
Web Codecs API The
predict_batch()method can also work withReadableStreamobjects. This enables efficient video processing while using the Web Codecs API for handling frames. Use this to build video processing pipelines in fewer lines of code.
Usage Examples for Input
You can pass any of the supported input types directly to the predict() or predict_batch() methods:
Output Data Structure
The predict() and predict_batch() methods of AIServerModel and CloudServerModel return a comprehensive result object. This object encapsulates the inference output from the model, along with contextual information about the processed frame.
The general structure of the returned object is as follows:
Accessing the Result Data
Inference Results: Access the main inference results using
someResult.result[0]. This is an array of objects, where each object represents a detected item, classification, pose, or segmentation mask.Frame Info / Number: Retrieve the unique identifier or frame information using
someResult.result[1]. Use this to correlate results with specific input frames, especially in batch processing.Original Input Image: Access the original input image as an
ImageBitmapviasomeResult.imageFrame. Note that this will benullif the input was anHTMLVideoElementto avoid memory issues with continuous video streams.Preprocessed Model Image: If the
saveModelImagemodel parameter is set totrue, thesomeResult.modelImageproperty will contain the preprocessed image as aBlobthat was sent to the model. This can be useful for debugging preprocessing steps.
Inference Result Types
The structure of the objects within someResult.result[0] varies depending on the type of AI model and its output. The SDK supports the following common inference result types:
Detection Results
Classification Results
Pose Detection Results
Segmentation Results
Multi-Label Classification Results
For detailed examples and explanations of each result type, refer to Result Object Structure + Examples. This document provides comprehensive JSON examples and descriptions for bbox, landmarks, category_id, label, score, and mask fields.
Displaying Results on a Canvas
The displayResultToCanvas() method handles the drawing of bounding boxes, labels, keypoints, and segmentation masks based on the model's output.
Parameters:
combinedResult: The result object returned bypredict()orpredict_batch().outputCanvasName: The ID of the HTML<canvas>element (as a string) or a direct reference to anHTMLCanvasElementorOffscreenCanvasobject where the results will be drawn.justResults(optional): A boolean flag. Iftrue, only the inference overlay (e.g., bounding boxes, labels) will be drawn on the canvas, without drawing the originalimageFrame. This is useful when you want to overlay results on an existing canvas content or when the input was a video stream, for example. Defaults tofalse.
Example:
Was this helpful?

