> For the complete documentation index, see [llms.txt](https://docs.ai.neevcloud.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.ai.neevcloud.com/api-reference/dedicated-models.md).

# Dedicated Model Deployment API

The Dedicated Model Deployment API provisions and manages **private, dedicated inference endpoints** for AI models on NeevAI GPU infrastructure. Each deployment runs a specific model (for example Llama, Mistral, or Phi) on reserved GPU hardware within your project and exposes an **OpenAI-compatible** REST endpoint, so you can point existing OpenAI client integrations at it without code changes.

Deployments are created from the **model catalog**, which lists the available models alongside their recommended GPU configurations and per-hour pricing. A deployment provisions asynchronously: once its `status` reaches `ready`, the `endpoint_url` field holds your private inference URL. You can scale a deployment's replica count up or down to match demand.

## What you can do

* **Deploy a model** — create a dedicated deployment by selecting a catalog model and GPU configuration, optionally with a tuned serving config.
* **Deploy from Hugging Face** — deploy directly from a Hugging Face model ID (for example, `mistralai/Mistral-7B-Instruct-v0.3`), letting NeevAI pick a GPU or specifying your own, with an optional access token for gated or private repositories.
* **Manage deployments** — list a project's deployments, fetch a single deployment's status and endpoint details, and delete a deployment.
* **Scale** — adjust the number of replicas for a running deployment.
* **Browse the catalog** — discover deployable models, their recommended GPU setups, and pricing. Use a catalog item's `id` as the `model_id` when creating a deployment.

Use the interactive reference below to inspect deployment payloads, status states (`provisioning` → `ready`), and response schemas.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.ai.neevcloud.com/api-reference/dedicated-models.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
