WebCodecs Example
Examples for using predict_batch with the WebCodecs API.
If your browser supports the WebCodecs API, you can create efficient video processing pipelines with DeGirumJS.
The WebCodecs API provides low-level access to the individual frames of a video stream. This allows for highly efficient and flexible video processing pipelines directly in the browser. When combined with DeGirumJS's predict_batch() method, you can perform real-time AI inference on a live webcam stream with minimal latency.
The core components of this pipeline are:
MediaStreamTrackProcessor: Takes aMediaStreamTrack(like from a webcam) and exposes its frames as aReadableStreamofVideoFrameobjects.predict_batch(): The DeGirumJS method that can directly consume aReadableStreamofVideoFrameobjects and efficiently process them for inference.MediaStreamTrackGenerator: Takes a stream of processedVideoFrameobjects and exposes them as a newMediaStreamTrack, which can be displayed in a<video>element.
Here are some examples demonstrating how to build pipelines using these components:
Example 1: ReadableStream as Input
This example demonstrates the most direct way to perform inference on a video stream. We will take the ReadableStream provided by the MediaStreamTrackProcessor and feed it directly into model.predict_batch().
How it works:
Get a
videoTrackfrom the webcam usingnavigator.mediaDevices.getUserMedia.Create a
MediaStreamTrackProcessorto get aReadableStreamofVideoFrameobjects.Pass this
readableStreamdirectly as the data source tomodel.predict_batch().Display the results in a
<canvas>.
<p>Inference results from a direct video stream:</p>
<canvas id="outputCanvas"></canvas>
<script src="https://assets.degirum.com/degirumjs/0.1.5/degirum-js.min.obf.js"></script>
<script type="module">
// --- Model Setup ---
const dg = new dg_sdk();
const secretToken = localStorage.getItem('secretToken') || prompt('Enter secret token:');
localStorage.setItem('secretToken', secretToken);
const MODEL_NAME = 'yolov8n_relu6_coco--640x640_quant_n2x_orca1_1';
const ZOO_IP = 'https://cs.degirum.com/degirum/public';
const zoo = await dg.connect('cloud', ZOO_IP, secretToken);
const model = await zoo.loadModel(MODEL_NAME);
// 1. Get video stream from webcam
const mediaStream = await navigator.mediaDevices.getUserMedia({ video: true });
const videoTrack = mediaStream.getVideoTracks()[0];
// 2. Create a processor to get a readable stream of frames
const processor = new MediaStreamTrackProcessor({ track: videoTrack });
const readableStream = processor.readable;
// 3. Feed the stream to predict_batch and loop through results
for await (const result of model.predict_batch(readableStream)) {
// Display the result on the canvas
await model.displayResultToCanvas(result, 'outputCanvas');
// IMPORTANT: Close the frame to release memory.
// The SDK does not close frames when you provide a raw stream.
result.imageFrame.close();
}
</script>Example 2: Real-Time Inference with Display in a <video> Element
<video> ElementWhile the first example is simple, you might want to output the processed video (with results drawn) into a <video> element (for further processing, use by other libraries in your code, etc...). This example uses WebCodecs for re-encoding the processed frames back into a video track.
We use a TransformStream to orchestrate the work and a MediaStreamTrackGenerator to create the final output video track. This pattern is more robust and flexible for building complex applications.
How it works:
A
MediaStreamTrackProcessorcreates aReadableStreamfrom the webcam.This stream is piped through a
TransformStream. Inside thetransformfunction, for eachframe:We run inference on the frame using
model.predict().We draw the original frame onto an
OffscreenCanvas.We use
model.displayResultToCanvas()to overlay the inference results on that same canvas.We enqueue a new
VideoFramecreated from the canvas to the stream's controller.We close the original
frameto free up memory.
The output of the
TransformStreamis piped to thewritableside of aMediaStreamTrackGenerator.The
MediaStreamTrackGenerator's track is then attached to a<video>element'ssrcObject.
Example 3: Parallel Inference on Four Video Streams
The WebCodecs API and DeGirumJS can handle multiple independent video pipelines at once. This example demonstrates four processed video streams displayed in a 2x2 grid.
This architecture is highly scalable. While we use a cloned track here, you could just as easily use four different video sources (e.g., multiple cameras or video files).
How it works:
Grab a single webcam track (
mainVideoTrack).Clone the track four times so each pipeline gets its own independent
MediaStreamTrack.For each pipeline:
Load a separate model instance.
Create a
MediaStreamTrackProcessorfor the cloned track to get aReadableStreamofVideoFrames.Pass the stream directly to
model.predict_batch().For each inference result, render detections onto the assigned
<canvas>element usingmodel.displayResultToCanvas().Close the frame after processing to release memory.
Was this helpful?

