Performance and Timing Statistics

Interpret performance and latency metrics collected during inference.

Once you enable measureTime on a Model instance, every predict and predict_batch result contains timing statistics. These are accumulated by the Model instance inside a timeStats object.

Available methods

  1. getTimeStats(): Use this method to return a formatted string of all the statistics collected so far.

  2. resetTimeStats(): Use this method to delete all your old statistics and create a fresh timeStats object to collect more statistics with.

  3. To access the timeStats object directly, you can use modelName.timeStats.stats["statName"], where the statName is one of the operations tracked.

  4. printLatencyInfo() logs a brief, human-readable summary of average timings into the console.

Example usage

let model = await zoo.loadModel('your_model_name', { measureTime: true });
let result = await model.predict(image);
console.log(model.getTimeStats()); // Pretty print time stats

// Access client-side and server-side timing stats
let preprocessDuration = model.timeStats.stats["ImagePreprocessDuration_ms"]; // Get image preprocess duration (min, avg, max, count)
let preprocessMin = model.timeStats.stats["ImagePreprocessDuration_ms"].min; // Get min image preprocess duration

let inferenceDuration = model.timeStats.stats["CoreInferenceDuration_ms"]; // Get core inference duration (min, avg, max, count)
let inferenceMax = model.timeStats.stats["CoreInferenceDuration_ms"].max; // Get max core inference duration

let frameTotalDuration = model.timeStats.stats["FrameTotalDuration_ms"]; // Get total time taken for the entire frame processing

let deviceTemp = model.timeStats.stats["DeviceTemperature_C"]; // Get device temperature if available

model.resetTimeStats(); // Reset time stats

Client-Side Timings

These metrics are measured within the JavaScript SDK running in the user's browser.

Key
Description

FrameTotalDuration_ms

(End-to-End) The total wall-clock time from the moment predict or predict_batch is called until the final processed result is ready for the user. This is the most comprehensive client-side metric.

MutexWait_ms

The time spent waiting to acquire a lock before starting to process a new frame. Only relevant for synchronous predict() calls. This value will be high if you are calling predict() faster than the model can process frames, indicating contention.

InputFrameConvert_ms

The time taken to validate and convert the user's input (e.g., a URL, base64 string, or HTMLImageElement) into a standardized format ready for preprocessing. (before preprocessing)

ImagePreprocessDuration_ms

The time spent on client-side image manipulation. This primarily consists of resizing the image to the model's required input dimensions and applying padding/cropping methods.

EncodeEmit_ms

The time taken to encode the image data and send it over the network. - For AIServerModel, this is just socket.send(blob). - For CloudServerModel, this involves encoding the data with msgpack and then socket.emit().

ResultProcessing_ms

The time spent processing a result after it has been received from the server. This includes matching it with the original frame info, applying label filters, and pushing it into the result queue.

ResultQueueWaitingTime_ms

The time a processed result sits in the output queue (resultQ) before being returned to the user's code. This measures back-pressure if the user code is consuming results slower than the model produces them.

SocketConnectWait_ms

A one-time (or per-reconnect) cost of establishing the network connection to the server. This will not appear for every frame.

Server-Side Timings

These metrics are measured on the AI Server or Cloud Server and are included in the result payload sent back to the client. The SDK simply extracts and records them.

Key
Description

PythonPreprocessDuration_ms

Duration of client-side pre-processing step including data loading time and data conversion time

CorePreprocessDuration_ms

Duration of server-side pre-processing step

CoreInferenceDuration_ms

Duration of server-side AI inference step

CorePostprocessDuration_ms

Duration of server-side post-processing step

CoreInputFrameSize_bytes

The size of received input frame

DeviceInferenceDuration_ms

(DeGirum ORCA models only) Duration of AI inference computations on AI accelerator IC excluding data transfers

DeviceTemperature_C

(DeGirum ORCA models only) Internal temperature of AI accelerator IC in C

DeviceFrequency_MHz

(DeGirum ORCA models only) Working frequency of AI accelerator IC in MHz

Last updated

Was this helpful?