Measuring performance

Measure latency and throughput for DeGirum models, capture per-stage timings, and apply repeatable test loops backed by consistent validation.

Estimated read time: 3 minutes

Performance work starts with trusted measurements. This guide shows how to enable timing collection, gather per-frame metrics, and compute throughput for repeatable regression tracking. Each code sample includes inline setup so you can copy and run sections independently.

Baseline checklist

Use this repeatable sequence whenever you capture new numbers:

Reset state

Call model.reset_time_stats() before the first measured run.

Collect baselines

Record system info (degirum sys-info), model version, and any custom parameters.

Execute a fixed workload

Run a deterministic loop (same inputs, same batch size) using predict or predict_batch.

Capture outputs

Log result.timing, model.time_stats(), and manual throughput numbers.

Publish context

Store the script, command, environment details, and raw logs so others can reproduce the run.

Inspect per-frame timing

InferenceResults.timing exposes millisecond breakdowns for preprocessing, inference, and postprocessing when model.measure_time is True.

Example

from degirum_tools import ModelSpec, remote_assets

model_spec = ModelSpec(
    model_name="yolov8n_coco--640x640_quant_hailort_multidevice_1",
    zoo_url="degirum/hailo",
    inference_host_address="@local",
    model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model = model_spec.load_model()
model.measure_time = True

# Optional warm-up inference
_ = model(remote_assets.three_persons)

result = model(remote_assets.three_persons)
for stage, value in result.timing.items():
    print(f"{stage}: {value:.2f} ms")

Example output:

preprocess: 2.87 ms
inference: 34.68 ms
postprocess: 4.09 ms

To persist values, append them to a JSON or CSV log alongside any metadata from result.info.

Aggregate with time stats

model.time_stats() accumulates min, average, max, and count since the last reset. Use it after running a workload loop.

Example

from degirum_tools import ModelSpec, remote_assets

model_spec = ModelSpec(
    model_name="yolov8n_coco--640x640_quant_hailort_multidevice_1",
    zoo_url="degirum/hailo",
    inference_host_address="@local",
    model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model = model_spec.load_model()
model.measure_time = True

model.reset_time_stats()
for _ in range(20):
    _ = model(remote_assets.three_persons)

for stage, stats in model.time_stats().items():
    print(
        f"{stage:>30}: min={stats.min:.2f} ms avg={stats.avg:.2f} ms "
        f"max={stats.max:.2f} ms count={stats.cnt}"
    )

Example output:

preprocess: min=2.81 ms avg=2.89 ms max=2.95 ms count=20
inference: min=34.51 ms avg=34.73 ms max=35.12 ms count=20
postprocess: min=3.98 ms avg=4.07 ms max=4.22 ms count=20

Keep inputs consistent. Mixing URLs, file paths, or arrays with different sizes can skew timings.

Compute throughput

For end-to-end throughput, measure a block of inferences with time.perf_counter.

Example

import time
from degirum_tools import ModelSpec, remote_assets

model_spec = ModelSpec(
    model_name="yolov8n_coco--640x640_quant_hailort_multidevice_1",
    zoo_url="degirum/hailo",
    inference_host_address="@local",
    model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model = model_spec.load_model()

samples = 50
model.reset_time_stats()
start = time.perf_counter()
for _ in range(samples):
    _ = model(remote_assets.three_persons)
wall_clock = time.perf_counter() - start

print(f"Throughput: {samples / wall_clock:.2f} FPS")
print(f"Average latency: {(wall_clock / samples) * 1000:.2f} ms")

Example output:

Throughput: 26.41 FPS
Average latency: 37.88 ms

Share both wall-clock and per-stage stats in your report so teammates can spot preprocessing or transfer bottlenecks.

Track regressions over time

Store raw timing logs under version control or an observability system.
Re-run the exact script when you change firmware, runtimes, or postprocessing parameters.
Compare against previous baselines before deploying to production. If a regression appears, bisect by toggling one change at a time.

PreviousStreaming results NextClass filtering

Last updated 2 months ago

Was this helpful?

Good afternoon

Baseline checklist

Inspect per-frame timing

Example

Aggregate with time stats

Example

Compute throughput

Example

Track regressions over time