# Streaming results

*Estimated read time: 3 minutes*

Streaming keeps inference responsive for dashboards, alerts, and downstream analytics. This page shows how to iterate over streaming APIs, publish lightweight payloads, and monitor latency. Each section includes its own setup so you can copy and run examples independently.

## Display a live overlay loop

Use `predict_stream` for camera feeds, RTSP streams, or looping video files. Each iteration yields an `InferenceResults` object.

### Example

{% code overflow="wrap" %}

```python
from degirum_tools import ModelSpec, Display, predict_stream, remote_assets

model_spec = ModelSpec(
    model_name="yolov8n_coco--640x640_quant_hailort_multidevice_1",
    zoo_url="degirum/hailo",
    inference_host_address="@local",
    model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model = model_spec.load_model()

video_source = remote_assets.traffic  # swap in a webcam index or RTSP URL
max_frames = 120  # stop after this many frames for demos

with Display("AI Camera — Live stream") as output_display:
    for index, result in enumerate(predict_stream(model, video_source), start=1):
        output_display.show(result.image_overlay)
        print(f"Rendered frame {index}")
        if index >= max_frames:
            break
```

{% endcode %}

Example output:

{% code overflow="wrap" %}

```python
Rendered frame 1
Rendered frame 2
Rendered frame 3
```

{% endcode %}

{% hint style="info" %}
Remove the `max_frames` guard for continuous playback.
{% endhint %}

## Publish lightweight payloads

Structured results serialize well to JSON once you convert NumPy types. Package only the fields you need before sending them across the network.

### Example

{% code overflow="wrap" %}

```python
import json
import time
from degirum_tools import ModelSpec, predict_stream, remote_assets

model_spec = ModelSpec(
    model_name="yolov8n_coco--640x640_quant_hailort_multidevice_1",
    zoo_url="degirum/hailo",
    inference_host_address="@local",
    model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model = model_spec.load_model()

video_source = remote_assets.traffic

for result in predict_stream(model, video_source):
    payload = {
        "timestamp": time.time(),
        "detections": [
            {
                "label": det.get("label"),
                "score": float(det.get("score", 0)),
                "bbox": [float(x) for x in det.get("bbox", [])],
            }
            for det in result.results
        ],
    }
    json_payload = json.dumps(payload)
    # send json_payload to your WebSocket client, MQTT broker, etc.
    print(json_payload)
    break  # remove break to stream continuously
```

{% endcode %}

Example output:

{% code overflow="wrap" %}

```python
{"timestamp": 1700000000.123, "detections": [{"label": "car", "score": 0.94, "bbox": [0.11, 0.33, 0.28, 0.77]}, {"label": "truck", "score": 0.61, "bbox": [0.52, 0.29, 0.83, 0.88]}]}
```

{% endcode %}

## Monitor latency and throughput

Use `result.timing` to inspect per-frame preprocessing, inference, and postprocessing times. Enable timing in the model specification if needed.

### Example

{% code overflow="wrap" %}

```python
from degirum_tools import ModelSpec, predict_stream, remote_assets

model_spec = ModelSpec(
    model_name="yolov8n_coco--640x640_quant_hailort_multidevice_1",
    zoo_url="degirum/hailo",
    inference_host_address="@local",
    model_properties={
        "device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"],
        "postprocess": {"timing": {"enable": True}},
    },
)
model = model_spec.load_model()

for index, result in enumerate(predict_stream(model, remote_assets.traffic), start=1):
    if result.timing:
        print(f"Frame {index} timing: {result.timing}")
    else:
        print("Timing disabled — set model_spec.model_properties['postprocess']['timing']")
    if index >= 3:
        break
```

{% endcode %}

Example output:

{% code overflow="wrap" %}

```python
Frame 1 timing: {'preprocess': 2.91, 'inference': 34.77, 'postprocess': 4.12}
Frame 2 timing: {'preprocess': 2.85, 'inference': 34.63, 'postprocess': 4.05}
Frame 3 timing: {'preprocess': 2.88, 'inference': 34.70, 'postprocess': 4.08}
```

{% endcode %}

{% hint style="info" %}
Consider sampling every few frames to avoid flooding logs. For end-to-end metrics (capture to publish), timestamp frames before and after inference and compute deltas.
{% endhint %}
