WebCodecs Example

Examples for using predict_batch with the WebCodecs API.

If your browser supports the WebCodecs API, you can create efficient video processing pipelines with DeGirumJS.

The WebCodecs API provides low-level access to the individual frames of a video stream. This allows for highly efficient and flexible video processing pipelines directly in the browser. When combined with DeGirumJS's predict_batch() method, you can perform real-time AI inference on a live webcam stream with minimal latency.

The core components of this pipeline are:

  1. MediaStreamTrackProcessor: Takes a MediaStreamTrack (like from a webcam) and exposes its frames as a ReadableStream of VideoFrame objects.

  2. predict_batch(): The DeGirumJS method that can directly consume a ReadableStream of VideoFrame objects and efficiently process them for inference.

  3. MediaStreamTrackGenerator: Takes a stream of processed VideoFrame objects and exposes them as a new MediaStreamTrack, which can be displayed in a <video> element.

Here are some examples demonstrating how to build pipelines using these components:

Example 1: ReadableStream as Input

This example demonstrates the most direct way to perform inference on a video stream. We will take the ReadableStream provided by the MediaStreamTrackProcessor and feed it directly into model.predict_batch().

How it works:

  • Get a videoTrack from the webcam using navigator.mediaDevices.getUserMedia.

  • Create a MediaStreamTrackProcessor to get a ReadableStream of VideoFrame objects.

  • Pass this readableStream directly as the data source to model.predict_batch().

  • Display the results in a <canvas>.

<p>Inference results from a direct video stream:</p>
<canvas id="outputCanvas"></canvas>

<script src="https://assets.degirum.com/degirumjs/0.1.5/degirum-js.min.obf.js"></script>
<script type="module">
    // --- Model Setup ---
    const dg = new dg_sdk();
    const secretToken = localStorage.getItem('secretToken') || prompt('Enter secret token:');
    localStorage.setItem('secretToken', secretToken);
    const MODEL_NAME = 'yolov8n_relu6_coco--640x640_quant_n2x_orca1_1';
    const ZOO_IP = 'https://cs.degirum.com/degirum/public';
    const zoo = await dg.connect('cloud', ZOO_IP, secretToken);
    const model = await zoo.loadModel(MODEL_NAME);

    // 1. Get video stream from webcam
    const mediaStream = await navigator.mediaDevices.getUserMedia({ video: true });
    const videoTrack = mediaStream.getVideoTracks()[0];

    // 2. Create a processor to get a readable stream of frames
    const processor = new MediaStreamTrackProcessor({ track: videoTrack });
    const readableStream = processor.readable;

    // 3. Feed the stream to predict_batch and loop through results
    for await (const result of model.predict_batch(readableStream)) {
        // Display the result on the canvas
        await model.displayResultToCanvas(result, 'outputCanvas');

        // IMPORTANT: Close the frame to release memory.
        // The SDK does not close frames when you provide a raw stream.
        result.imageFrame.close();
    }
</script>

Example 2: Real-Time Inference with Display in a <video> Element

While the first example is simple, you might want to output the processed video (with results drawn) into a <video> element (for further processing, use by other libraries in your code, etc...). This example uses WebCodecs for re-encoding the processed frames back into a video track.

We use a TransformStream to orchestrate the work and a MediaStreamTrackGenerator to create the final output video track. This pattern is more robust and flexible for building complex applications.

How it works:

  • A MediaStreamTrackProcessor creates a ReadableStream from the webcam.

  • This stream is piped through a TransformStream. Inside the transform function, for each frame:

    1. We run inference on the frame using model.predict().

    2. We draw the original frame onto an OffscreenCanvas.

    3. We use model.displayResultToCanvas() to overlay the inference results on that same canvas.

    4. We enqueue a new VideoFrame created from the canvas to the stream's controller.

    5. We close the original frame to free up memory.

  • The output of the TransformStream is piped to the writable side of a MediaStreamTrackGenerator.

  • The MediaStreamTrackGenerator's track is then attached to a <video> element's srcObject.

Example 3: Parallel Inference on Four Video Streams

The WebCodecs API and DeGirumJS can handle multiple independent video pipelines at once. This example demonstrates four processed video streams displayed in a 2x2 grid.

This architecture is highly scalable. While we use a cloned track here, you could just as easily use four different video sources (e.g., multiple cameras or video files).

How it works:

  • Grab a single webcam track (mainVideoTrack).

  • Clone the track four times so each pipeline gets its own independent MediaStreamTrack.

  • For each pipeline:

    1. Load a separate model instance.

    2. Create a MediaStreamTrackProcessor for the cloned track to get a ReadableStream of VideoFrames.

    3. Pass the stream directly to model.predict_batch().

    4. For each inference result, render detections onto the assigned <canvas> element using model.displayResultToCanvas().

    5. Close the frame after processing to release memory.

Was this helpful?