# Inference with cloud models

*Estimated read time: 2 minutes*

This setup runs inference on local hardware (e.g., `inference_host_address="localhost"` or `inference_host_address="<host_ip>:<port>"`) while fetching models from the cloud zoo (`zoo_url="degirum/axelera"`). It's useful when compute is local, but you want to pull models directly from DeGirum's public AI Hub.

## Client and AI Server on the same host

In this case, the AI Server and client application run on the same machine.

To start the server, run:

{% code overflow="wrap" %}

```bash
degirum server
```

{% endcode %}

You should see output similar to:

{% code overflow="wrap" %}

```bash
DeGirum asio server is started at TCP port 8778
Local model zoo is served from '.' directory.  
Press Enter to stop the server
```

{% endcode %}

The server runs until you press `ENTER` in the terminal. By default, it listens on TCP port 8778. To specify a different port, use the `--port` argument.

{% code overflow="wrap" %}

```
degirum server --port <your_port>
```

{% endcode %}

### Example ModelSpec

This `ModelSpec` runs inference on a local AI Server using a model fetched from DeGirum's public Axelera model zoo:

<pre class="language-python" data-overflow="wrap"><code class="lang-python"># Example ModelSpec
model_spec = ModelSpec(
    model_name="yolov8n_coco--640x640_quant_axelera_metis_1",
    <a data-footnote-ref href="#user-content-fn-1">zoo_url="degirum/axelera"</a>,
    <a data-footnote-ref href="#user-content-fn-2">inference_host_address="localhost:8778"</a>,
    model_properties={"device_type": ["AXELERA/METIS"]}
)
</code></pre>

* `zoo_url="degirum/axelera"`: loads the model from [DeGirum's public Axelera model zoo](https://hub.degirum.com/public-models/degirum/axelera?utm_source=docs.degirum.com\&utm_medium=site\&utm_campaign=cookbooks-axelera-cookbook-ai-server-inference-with-cloud-models).
* `inference_host_address="localhost:8778"`: runs inference using Axelera devices managed by a local AI server.

## Client and AI Server on different hosts

You can also run the AI server on a remote host. On that remote machine, launch the server:

{% code overflow="wrap" %}

```bash
degirum server
```

{% endcode %}

Expected output:

{% code overflow="wrap" %}

```bash
DeGirum asio server is started at TCP port 8778
Local model zoo is served from '.' directory.  
Press Enter to stop the server
```

{% endcode %}

The server runs until you press `ENTER` in the terminal. By default, it listens on TCP port 8778. To specify a different port, use the `--port` argument:

{% code overflow="wrap" %}

```bash
degirum server --port <your_port>
```

{% endcode %}

## Example ModelSpec

This `ModelSpec` runs inference on a remote AI Server using a model fetched from DeGirum's public Axelera model zoo.

<pre class="language-python" data-overflow="wrap"><code class="lang-python"># Example ModelSpec
model_spec = ModelSpec(
    model_name="yolov8n_coco--640x640_quant_axelera_metis_1",
    <a data-footnote-ref href="#user-content-fn-1">zoo_url="degirum/axelera"</a>,
    <a data-footnote-ref href="#user-content-fn-3">inference_host_address="&#x3C;host_ip>:&#x3C;port>"</a>,
    model_properties={"device_type": ["AXELERA/METIS"]}
)
</code></pre>

* `zoo_url="degirum/axelera"`: fetches the model from [DeGirum's public Axelera model zoo](https://hub.degirum.com/public-models/degirum/axelera?utm_source=docs.degirum.com\&utm_medium=site\&utm_campaign=cookbooks-axelera-cookbook-ai-server-inference-with-cloud-models).
* `inference_host_address="<host_ip>:<port>"`: runs inference using Axelera devices managed by the remote AI server.

[^1]: Indicates that the model will be loaded from [DeGirum's Axelera public model zoo](https://hub.degirum.com/public-models/degirum/axelera?utm_source=docs.degirum.com\&utm_medium=site\&utm_campaign=cookbooks-axelera-cookbook-ai-server-inference-with-cloud-models).

[^2]: Indicates that the model will be run using Axelera devices managed by the AI Server listening on `"localhost:8778"`.

[^3]: Indicates that the model will be run using Axelera devices managed by the AI Server available at `"<host_ip>:<port>"`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.degirum.com/axelera/advanced-guides/ai-server/ai-server-inference-with-cloud-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
