Working with Input and Output Data

Guide for the various input data formats supported by DeGirumJS for inference, as well as a detailed breakdown of the output result object structures for different model types.

Input Data Formats

DeGirumJS predict() and predict_batch() methods are designed to be flexible, accepting a wide range of input image formats. Internally, the SDK uses the ImageBitmap API for efficient image processing and handles the conversion of various input types into a standardized ImageBitmap format before sending them to the model for inference.

The following input types are supported:

HTML Elements:
- HTMLImageElement (<img>)
- SVGImageElement (<image> within SVG)
- HTMLVideoElement (<video>) - The current frame will be used.
- HTMLCanvasElement (<canvas>)
- OffscreenCanvas
Image Data Objects:
- ImageBitmap
- Blob
- ImageData
- File (specifically image files like image/jpeg, image/png, etc.)
- VideoFrame (if available in the environment)
String Formats:
- Image URL: A standard URL pointing to an image resource. Example: https://example.com/path/to/image.jpg
- Data URL: A string representing a Base64-encoded image, prefixed with data:. Example: data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==
- Base64 String: A raw Base64-encoded string of image data (without the data: prefix).
Array Buffer Types:
Batch Processing: The predict_batch() method can accept an Async Generator that yields pairs of input data and frame identifiers. The input data must be in one of the above formats. This allows for efficient processing of multiple frames for real-time applications such as video streams or multi-frame inference. See Advanced Inference: Batch Processing & Callbacks

async function* imageGenerator() {
    yield [image1, 'frame1'];
    yield [image2, 'frame2'];
    // ...
}

Web Codecs API The predict_batch() method can also work with ReadableStream objects. This enables efficient video processing while using the Web Codecs API for handling frames. Use this to build video processing pipelines in fewer lines of code.

Usage Examples for Input

You can pass any of the supported input types directly to the predict() or predict_batch() methods:

// Assuming 'model' is an initialized CloudServerModel or AIServerModel instance

// 1. Using an HTMLImageElement
const imgElement = document.getElementById('myImage');
const result1 = await model.predict(imgElement);

// 2. Using a File object (e.g., from an <input type="file">)
const fileInput = document.getElementById('fileUpload');
fileInput.addEventListener('change', async (event) => {
    const file = event.target.files[0];
    if (file && file.type.startsWith('image/')) {
        const result2 = await model.predict(file);
        console.log('Inference result from File:', result2);
    }
});

// 3. Using an Image URL
const imageUrl = 'https://www.degirum.com/images/degirum-logo.png';
const result3 = await model.predict(imageUrl);

// 4. Using a Data URL
const dataUrl = 'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAADAAADAREAAhEBAxEB/8QAFQABAQAAAAAAAAAAAAAAAAAAAAb/xAAUEAEAAAAAAAAAAAAAAAAAAAAA/8QAFAEBAAAAAAAAAAAAAAAAAAAAAP/EABQRAQAAAAAAAAAAAAAAAAAAAAD/2gAMAwEAAhEDEQA/AKgAAH//Z';
const result4 = await model.predict(dataUrl);

// 5. Using a Base64 string (without the data: prefix)
const base64String = 'iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==';
const result5 = await model.predict(base64String);

// 6. Using an ArrayBuffer (e.g., from fetching an image as arrayBuffer)
async function fetchImageAsArrayBuffer(url) {
    const response = await fetch(url);
    return await response.arrayBuffer();
}
const arrayBuffer = await fetchImageAsArrayBuffer('https://www.degirum.com/images/degirum-logo.png');
const result6 = await model.predict(arrayBuffer);

// 7. Using a Uint8Array
const uint8Array = new Uint8Array([/* ... image byte data ... */]);
const result7 = await model.predict(uint8Array);

// 8. Using predict_batch with an AsyncGenerator
async function* imageGenerator() {
    yield [imgElement, 'frame1'];
    yield [imageUrl, 'frame2'];
    yield [dataUrl, 'frame3'];
}
for await (const batchResult of model.predict_batch(imageGenerator())) {
    console.log('Batch inference result:', batchResult);
}

Output Data Structure

The predict() and predict_batch() methods of AIServerModel and CloudServerModel return a comprehensive result object. This object encapsulates the inference output from the model, along with contextual information about the processed frame.

The general structure of the returned object is as follows:

{
    "result": [
        [ /* Inference results (array of objects, structure varies by model type) */ ],
        "frame_info_string" // Unique identifier for the frame
    ],
    "imageFrame": ImageBitmap, // The original input image as an ImageBitmap (if not a video element)
    "modelImage": Blob // The preprocessed image blob sent to the model (if `saveModelImage` is true)
}

Accessing the Result Data

Inference Results: Access the main inference results using someResult.result[0]. This is an array of objects, where each object represents a detected item, classification, pose, or segmentation mask.
Frame Info / Number: Retrieve the unique identifier or frame information using someResult.result[1]. Use this to correlate results with specific input frames, especially in batch processing.
Original Input Image: Access the original input image as an ImageBitmap via someResult.imageFrame. Note that this will be null if the input was an HTMLVideoElement to avoid memory issues with continuous video streams.
Preprocessed Model Image: If the saveModelImage model parameter is set to true, the someResult.modelImage property will contain the preprocessed image as a Blob that was sent to the model. This can be useful for debugging preprocessing steps.

Inference Result Types

The structure of the objects within someResult.result[0] varies depending on the type of AI model and its output. The SDK supports the following common inference result types:

Detection Results
Classification Results
Pose Detection Results
Segmentation Results
Multi-Label Classification Results

For detailed examples and explanations of each result type, refer to Result Object Structure + Examples. This document provides comprehensive JSON examples and descriptions for bbox, landmarks, category_id, label, score, and mask fields.

Displaying Results on a Canvas

The displayResultToCanvas() method handles the drawing of bounding boxes, labels, keypoints, and segmentation masks based on the model's output.

/**
 * Overlay the result onto the image frame and display it on the canvas.
 * @async
 * @param {Object} combinedResult - The result object combined with the original image frame. This is directly received from `predict` or `predict_batch`
 * @param {string|HTMLCanvasElement|OffscreenCanvas} outputCanvasName - The canvas to draw the image onto. Either the canvas element or the ID of the canvas element.
 * @param {boolean} [justResults=false] - Whether to show only the result overlay without the image frame.
 */
async displayResultToCanvas(combinedResult, outputCanvasName, justResults = false)

Parameters:

combinedResult: The result object returned by predict() or predict_batch().
outputCanvasName: The ID of the HTML <canvas> element (as a string) or a direct reference to an HTMLCanvasElement or OffscreenCanvas object where the results will be drawn.
justResults (optional): A boolean flag. If true, only the inference overlay (e.g., bounding boxes, labels) will be drawn on the canvas, without drawing the original imageFrame. This is useful when you want to overlay results on an existing canvas content or when the input was a video stream, for example. Defaults to false.

Example:

// Assuming 'model' is an initialized CloudServerModel or AIServerModel instance
// and 'myImage' is a valid input image
const outputCanvas = document.getElementById('outputCanvas');

async function runInferenceAndDisplay() {
    try {
        const result = await model.predict(myImage);
        // Display the result on the canvas
        await model.displayResultToCanvas(result, outputCanvas);
        console.log('Inference and display complete!');
    } catch (error) {
        console.error('Error during inference or display:', error);
    }
}

runInferenceAndDisplay();

PreviousWebCodecs Example NextRelease Notes

Last updated 27 days ago

Was this helpful?