Model Zoo Manager
There are three main concepts in PySDK: the AI inference engine, the AI model zoo, and the AI model. The AI inference engines perform inferences of AI models, while AI model zoos are places where these models are stored.
PySDK supports the following AI inference types:
-
Local inference: when the client application uses PySDK to directly communicate with the AI hardware accelerator installed on the same computer where this application runs.
-
AI Server inference: when the AI hardware accelerator is controlled by the DeGirum AI Server software stack, and the client application communicates with that AI Server to perform AI inferences. The client application and the AI server can run on two different computers connected to the same local network.
-
Cloud inference: when the client application communicates with the DeGirum Cloud Platform software over the Internet to perform AI inferences on the DeGirum Cloud Farm devices.
PySDK supports the following AI model zoo types:
-
Local model zoo: when the set of AI models is located in some local directory on the computer with AI hardware accelerator. In the case of the local inference, the local model zoo is located on the computer where you run your application. In the case of AI server inference the local model zoo is located on the computer where AI Server software is installed.
-
Cloud model zoo: when the set of AI models is located on the DeGirum Cloud Platform. You create and maintain cloud model zoos using DeGirum Cloud Platform web GUI. There are two types of cloud model zoos: public and private. A public model zoo is visible to all registers cloud users, while a private model zoo is visible only the the members of your organization.
Almost all combinations of AI inference type and model zoo type are supported by the PySDK:
- A cloud inference using a cloud model zoo
- An AI Server inference using a cloud model zoo
- An AI Server inference a using a local model zoo
- A local inference using a cloud model zoo
- A local inference using a local model zoo or a particular model from a local model zoo
Note: the combination of a cloud inference with a local model zoo is not supported.
The PySDK starting point is degirum.connect function, which creates and returns degirum.zoo_manager.ZooManager model zoo manager object. This function has the following parameters, which specify the inference type and the model zoo to use:
inference_host_address
: inference engine designator; it defines which inference engine to use.
For AI Server-based inference it can be either the hostname or IP address of the AI Server host,
optionally followed by the port number in the form host:port
.
For DeGirum Cloud Platform-based inference it is the string "@cloud"
or degirum.CLOUD constant.
For local inference it is the string "@local"
or degirum.LOCAL constant.
zoo_url
: model zoo URL string which defines the model zoo to operate with.
For a cloud model zoo, it is specified in the following format: <cloud server prefix>[/<zoo suffix>]
.
The <cloud server prefix>
part is the cloud platform root URL, typically https://cs.degirum.com
.
The optional <zoo suffix>
part is the cloud zoo URL suffix in the form <organization>/<model zoo name>
.
You can confirm zoo URL suffix by visiting your cloud user account and opening the model zoo management page.
If <zoo suffix>
is not specified, then DeGirum public model zoo degirum/public
is used.
For AI Server-based inferences, you may omit both zoo_url
and token
parameters.
In this case locally-deployed model zoo of the AI Server will be used.
For local AI hardware inferences you specify zoo_url
parameter as either a path to a local
model zoo directory, or a path to model's .json configuration file.
The token
parameter is not needed in this case.
token
: cloud API access token used to access the cloud zoo. To obtain this token you need to open a user account on DeGirum cloud platform. Please login to your account and go to the token generation page to generate an API access token.
The function returns the model zoo manager object, which connects to the model zoo of your choice and provides the following functionality:
- list and search models available in the connected model zoo;
- create appropriate AI model handling objects to perform AI inferences;
- request various AI model parameters.
Model Zoo URL Cheat Sheet
Inference Type | Model Zoo Type | connect() parameters |
---|---|---|
Cloud inference | Cloud zoo | zoo = dg.connect(dg.CLOUD, "https://cs.degirum.com[/<zoo URL>]", "<token>") |
AI server inference | Cloud zoo | zoo = dg.connect("<hostname>", "https://cs.degirum.com[/<zoo URL>]", "<token>") |
AI server inference | Local zoo | zoo = dg.connect("<hostname>") |
Local inference | Cloud zoo | zoo = dg.connect(dg.LOCAL, "https://cs.degirum.com[/<zoo URL>]", "<token>") |
Local inference | Local zoo | zoo = dg.connect(dg.LOCAL, "/path/to/local/zoo/dir") |
Local inference | Local file | zoo = dg.connect(dg.LOCAL, "/path/to/model.json") |
Cloud Model Caching
Each time you request a model for inference from a cloud model zoo on either local system or on AI server host, it gets downloaded to a local filesystem of that inference host into a some internal cache directory.
Cache directories are maintained per each cloud zoo URL.
If the model already exists in the cache directory, its checksum is verified against the checksum of the model in the cloud zoo. If these checksums mismatch, the model from the cloud zoo is downloaded and replaces the model in the cache. This mechanism guarantees that each time you request a cloud model for AI inference, you always get the most up to date model. It greatly simplifies the model deployment on a large quantity of distributed nodes, when each node automatically downloads the updated model from the cloud on the first model inference request.
Cache directories' root location is operating system specific:
- For Windows it is
%APPDATA%/DeGirum
- For Linux it is
~/.local/share/DeFirum
- For MacOS it is
Library/Application Support/DeGirum
The cache size is limited (currently by 1GB, to avoid uncontrolled growth of model cache directory). Once it is exceeded, the least recently used models get evicted from the cache.
Listing and Searching AI Models
AI model is represented in a model zoo by a set of files stored in the model subdirectory, which is unique for each model. Each model subdirectory contains the following model files:
Model File | Description |
---|---|
<model name>.json |
JSON file containing all model parameters. The name of this file is the name of the model. This file is mandatory. |
<model name>.n2x |
DeGirum Orca binary file containing the model. This file is mandatory for DeGirum Orca models |
<model name>.tflite |
TensorFlow Lite binary file containing the model. This file is mandatory for TFLite models |
<model name>.onnx |
ONNX binary file containing the model. This file is mandatory for ONNX or Tensor RT models |
<model name>.blob |
OpenVINO binary file containing the model. This file is mandatory for OpenVINO models |
<class dictionary>.json |
JSON file containing class labels for classification, detection, or segmentation models. This file is optional. |
To obtain the list of available AI models, you may use degirum.zoo_manager.ZooManager.list_models method. This method accepts arguments which specify the model filtering criteria. All the arguments are optional. If a certain argument is omitted, then the corresponding filtering criterion is not applied. The following filters are available:
Method Parameter | Description | Possible Values |
---|---|---|
model_family |
Model family name filter. Used as a search substring in the model name | Any valid substring like "yolo" , "mobilenet" |
device |
Inference device filter: a string or a list of strings of device names | "orca1" : DeGirum Orca device"cpu" : host CPU"gpu" : host GPU"edgetpu" : Google EdgeTPU device"dla" : Nvidia DLA device"dla_fallback" : Nvidia DLA device with host GPU fallback"myriad" : Myriad device |
precision |
Model calculation precision filter: a string or a list of strings of model precision labels | "quant" : quantized model"float" : floating point model |
pruned |
Model density filter: a string or a list of strings of model density labels | "dense" : dense model"pruned" : sparse/pruned model |
runtime |
Runtime agent type filter: a string or a list of strings of runtime agent types | "n2x" : DeGirum N2X runtime"tflite" : Google TFLite runtime"tensorrt" : Nvidia Tensor RT runtime"openvino" : OpenVINO runtime |
The method returns a list of model name strings. These model name strings are to be used later when you load AI models for inference.
The degirum.zoo_manager.ZooManager.list_models method returns the list of models, which was requested at the time you connected to a model zoo by calling degirum.connect. This list of models is then stored inside the Model Zoo manager object, so subsequent calls to degirum.zoo_manager.ZooManager.list_models method would quickly return the model list without connecting to a remote model zoo. If you suspect that the remote model zoo contents changed, then to update the model list you need to create another instance of Zoo Manager object by calling degirum.connect.
Loading AI Models
Once you obtained the AI model name string, you may load this model for inference by calling degirum.zoo_manager.ZooManager.load_model method and supplying the model name string as its argument. For example:
If a model with the supplied name string is found, the degirum.zoo_manager.ZooManager.load_model method returns model handling object of degirum.model.Model class. Otherwise, it throws an exception.
You may pass arbitrary model properties (properties of degirum.model.Model class) as keyword arguments to the degirum.zoo_manager.ZooManager.load_model method. In this case these properties will be assigned to the model object. For example:
model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1", output_confidence_threshold=0.5, input_pad_method="letterbox")
Models from Cloud Zoo
If you load the model from the cloud model zoo, this model will be downloaded first and stored in the cache directory associated with that cloud model zoo. If the model already exists in the cache, it will be loaded from that cache but only if the cached model checksum matches the model checksum in the cloud zoo. If checksums do not match, the model from the cloud zoo will be downloaded again into the cache.
Note: your inference host needs to have Internet access to work with cloud zoo models.
Note: Model Zoo manager does not provide any explicit method to download a model from the cloud zoo: the model is downloaded automatically when possible, if the model is not in the cache or the cached model checksum does not match the model checksum in the cloud zoo. However, the PySDK provides degirum download-zoo command to explicitly download the whole or part of a cloud model zoo of your choice to the local directory.
Models from AI Server Local Zoo
If you load the model from the AI server local model zoo, the command to load the model will be sent to the AI server: the connected AI server will handle all model loading actions remotely.
Note: The AI Server lists the models that it serves and you can only load those models: the job of managing remote AI server model zoo is not handled by Model Zoo manager class and should be done different way. Please refer to Configuring and Launching AI Server section for details.
Models from Local Drive
In order to work with locally-deployed model, you need to download that model from some model zoo in advance, for example by using PySDK degirum download-zoo command. This is the only option which allows you to perform AI inferences without any network connection.