Measuring performance
Measure latency and throughput for DeGirum models, capture per-stage timings, and apply repeatable test loops backed by consistent validation.
Estimated read time: 3 minutes
Performance work starts with trusted measurements. This guide shows how to enable timing collection, gather per-frame metrics, and compute throughput for repeatable regression tracking. Each code sample includes inline setup so you can copy and run sections independently.
Baseline checklist
Use this repeatable sequence whenever you capture new numbers:
Reset state
Call model.reset_time_stats() before the first measured run.
Collect baselines
Record system info (degirum sys-info), model version, and any custom parameters.
Execute a fixed workload
Run a deterministic loop (same inputs, same batch size) using predict or predict_batch.
Capture outputs
Log result.timing, model.time_stats(), and manual throughput numbers.
Publish context
Store the script, command, environment details, and raw logs so others can reproduce the run.
Inspect per-frame timing
InferenceResults.timing exposes millisecond breakdowns for preprocessing, inference, and postprocessing when model.measure_time is True.
Example
from degirum_tools import ModelSpec, remote_assets
model_spec = ModelSpec(
model_name="yolov8n_coco--640x640_quant_hailort_multidevice_1",
zoo_url="degirum/hailo",
inference_host_address="@local",
model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model = model_spec.load_model()
model.measure_time = True
# Optional warm-up inference
_ = model(remote_assets.three_persons)
result = model(remote_assets.three_persons)
for stage, value in result.timing.items():
print(f"{stage}: {value:.2f} ms")Example output:
preprocess: 2.87 ms
inference: 34.68 ms
postprocess: 4.09 msAggregate with time stats
model.time_stats() accumulates min, average, max, and count since the last reset. Use it after running a workload loop.
Example
from degirum_tools import ModelSpec, remote_assets
model_spec = ModelSpec(
model_name="yolov8n_coco--640x640_quant_hailort_multidevice_1",
zoo_url="degirum/hailo",
inference_host_address="@local",
model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model = model_spec.load_model()
model.measure_time = True
model.reset_time_stats()
for _ in range(20):
_ = model(remote_assets.three_persons)
for stage, stats in model.time_stats().items():
print(
f"{stage:>30}: min={stats.min:.2f} ms avg={stats.avg:.2f} ms "
f"max={stats.max:.2f} ms count={stats.cnt}"
)Example output:
preprocess: min=2.81 ms avg=2.89 ms max=2.95 ms count=20
inference: min=34.51 ms avg=34.73 ms max=35.12 ms count=20
postprocess: min=3.98 ms avg=4.07 ms max=4.22 ms count=20Keep inputs consistent. Mixing URLs, file paths, or arrays with different sizes can skew timings.
Compute throughput
For end-to-end throughput, measure a block of inferences with time.perf_counter.
Example
import time
from degirum_tools import ModelSpec, remote_assets
model_spec = ModelSpec(
model_name="yolov8n_coco--640x640_quant_hailort_multidevice_1",
zoo_url="degirum/hailo",
inference_host_address="@local",
model_properties={"device_type": ["HAILORT/HAILO8L", "HAILORT/HAILO8"]},
)
model = model_spec.load_model()
samples = 50
model.reset_time_stats()
start = time.perf_counter()
for _ in range(samples):
_ = model(remote_assets.three_persons)
wall_clock = time.perf_counter() - start
print(f"Throughput: {samples / wall_clock:.2f} FPS")
print(f"Average latency: {(wall_clock / samples) * 1000:.2f} ms")Example output:
Throughput: 26.41 FPS
Average latency: 37.88 msShare both wall-clock and per-stage stats in your report so teammates can spot preprocessing or transfer bottlenecks.
Track regressions over time
Store raw timing logs under version control or an observability system.
Re-run the exact script when you change firmware, runtimes, or postprocessing parameters.
Compare against previous baselines before deploying to production. If a regression appears, bisect by toggling one change at a time.
Last updated
Was this helpful?

