Batch Processing and Callbacks

This guide covers model.predict_batch(), asynchronous callbacks, and how to manage the inference queue.

Batch Processing with `predict_batch()`

The model.predict_batch() method is an async generator that processes a sequence of images. This is ideal for scenarios like processing frames from a video or handling a large dataset of images efficiently.

You can provide data to predict_batch in two main ways:

Async Iterable: Any object that implements the async iteration protocol, such as an array of image sources or a custom generator function.
ReadableStream: A standard web API for handling streams of data, perfect for sources like the WebCodecs API.

The method processes images from the source, sends them for inference, and yields the results as they become available. You consume these results using a for await...of loop.

Example 1: Camera Inference Using an Async Generator Function

Here's how you can define a simple async generator to feed webcam frames to predict_batch.

// Create video element and give it access to the webcam
const video = document.createElement('video');
video.autoplay = true;
video.style.display = 'none';
document.body.appendChild(video);

const stream = await navigator.mediaDevices.getUserMedia({ video: true });
video.srcObject = stream;

// Wait for video to be ready
await new Promise(resolve => video.onloadedmetadata = resolve);

// Frame generator yielding camera frames + frameId
async function* frameGenerator() {
    let frameId = 0;
    while (true) {
        if (!video.videoWidth || !video.videoHeight) continue;
        const bitmap = await createImageBitmap(video);
        yield [bitmap, `frame_${frameId++}`];
    }
}

// Run inference on the webcam frames
for await (const result of model.predict_batch(frameGenerator())) {
    model.displayResultToCanvas(result, 'outputCanvas');
}

Example 2: Using an Array as an Async Iterable

Here's how you can process a predefined list of image URLs. We create a simple async generator that yields the image URL and a unique frame identifier.

// A simple async generator to feed predict_batch
async function* createImageSource(imageUrls) {
    let frameId = 0;
    for (const url of imageUrls) {
        // Yield a tuple: [imageData, frameInfo]
        yield [url, `frame_${frameId++}`];
    }
}

// Array of image URLs to process
const urls = ['/path/to/image1.jpg', '/path/to/image2.jpg', '/path/to/image3.jpg'];
const dataSource = createImageSource(urls);

// Use for await...of to process the results
for await (const result of model.predict_batch(dataSource)) {
    console.log(`Received result for: ${result.result[1]}`); // e.g., "Received result for: frame_0"
    model.displayResultToCanvas(result, 'outputCanvas');
    // Pause for a moment to see the result
    await new Promise(resolve => setTimeout(resolve, 500));
}

console.log('Batch processing complete.');

Example 3: Using a `ReadableStream`

predict_batch can directly consume a ReadableStream. This is powerful for streaming video frames, for example from a file or a live camera feed using the WebCodecs API.

// Conceptual example: assuming you have a ReadableStream of VideoFrame objects ready
// let videoFrameStream = getStreamFromWebCodecs(); // some ReadableStream

// model.predict_batch can directly consume the stream
// No custom generator is needed.
for await (const result of model.predict_batch(videoFrameStream)) {
    console.log('Processed a frame from the stream:', result);
    model.displayResultToCanvas(result, 'outputCanvas');
}

Please view the WebCodecs Example for a complete demonstration of using WebCodecs and ReadableStream in DeGirumJS.

Asynchronous Flow with Callbacks

Instead of using a for await...of loop to pull results, you can adopt an event-driven approach by providing a callback function when you load the model. When a callback is provided, predict_batch will not yield results. Instead, your callback function will be invoked automatically for each result as it arrives from the server.

This decouples the sending of frames from the receiving of results, which is ideal for real-time applications where you don't want your main loop to be blocked waiting for inference to complete.

When to use which pattern:

for await...of (Default): Best for situations where you want to handle results sequentially in a straightforward, linear manner.
callback (Event-Driven): Alternative for continuous, real-time streams. It can prevent back-pressure on your UI thread and allows your application to remain responsive.

Example: Using a Callback

// 1. Define your callback function
function handleInferenceResult(result, frameInfo) {
    console.log(`Callback received result for frame: ${frameInfo}`);
    // The 'result' object here is the same as the one from predict()
    model.displayResultToCanvas(result, 'outputCanvas');
}

// 2. Load the model with the callback option
let model = await zoo.loadModel(
    'yolo_v5s_coco--512x512_quant_n2x_cpu_1',
    { callback: handleInferenceResult }
);

// 3. Start the batch prediction. Note that we don't 'await' results here.
// The loop will run to completion, queuing up all frames.
// Results will be handled by the callback function asynchronously.
const urls = ['/path/to/image1.jpg', '/path/to/image2.jpg', '/path/to/image3.jpg'];
const dataSource = createImageSource(urls);
await model.predict_batch(dataSource); // The 'await' here just waits for all frames to be sent

console.log('All frames sent to the server. Results will arrive in the callback.');

Controlling Back-pressure with `max_q_len`

When you send frames for inference, they are placed in a queue. max_q_len (maximum queue length) is an option you can set during model loading that defines the maximum number of frames that can be "in flight" at once.

max_q_len (default: 10 for AI Server, 80 for Cloud Server): The size of the internal queues (infoQ and resultQ) that buffer frames and their results.

This parameter is crucial for managing system resources and preventing your application from sending data faster than the inference server can handle it. If the queue is full, your predict() or predict_batch() call will pause (asynchronously) until a space becomes available. This is a form of back-pressure that keeps the pipeline stable.

// Load a model with a custom queue length of 5
let model = await zoo.loadModel(
    'your_model_name',
    { max_q_len: 5 } // Allow up to 5 frames to be in-flight
);

A smaller max_q_len can reduce memory usage but may lower throughput if the network or server has high latency. A larger value can improve throughput by ensuring the server is never idle, but it will consume more memory and increase end-to-end latency for any single frame.

PreviousArchitecture and Connection Modes NextDevice Management for Inference

Last updated 27 days ago

Was this helpful?

Batch Processing with predict_batch()