# AI Model

Create, list, inspect, scale, and delete **Dedicated Model Deployments**. A deployment runs a specific AI model on reserved GPU hardware within your project. Once `status` reaches `ready`, the `endpoint_url` field contains your private OpenAI-compatible inference endpoint.

## List model deployments

> Retrieve a paginated list of all Dedicated Model Deployments in a project.

```json
{"openapi":"3.0.3","info":{"title":"Dedicated Model Deployment API","version":"0.1.0"},"tags":[{"name":"AI Model","description":"Create, list, inspect, scale, and delete **Dedicated Model Deployments**. A deployment runs a specific AI model on reserved GPU hardware within your project. Once `status` reaches `ready`, the `endpoint_url` field contains your private OpenAI-compatible inference endpoint.\n"}],"servers":[{"url":"https://api.ai.neevcloud.com/aimodels","description":"Consolidated public API gateway"}],"security":[{"BearerAuth":[]}],"components":{"securitySchemes":{"BearerAuth":{"type":"http","scheme":"bearer","description":"Obtain an **`access_token`** from `POST /api/v1/auth/login` on the tenant API (same credentials as the console). In Authorize, paste **only that token** — do not prepend `Bearer`, and do not use inference keys (`sk-nc-*`).\n"}},"parameters":{"OrgID":{"name":"org_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your organization identifier (e.g. `org-abc123`). Found in the NeevAI console under Organization Settings.\n"},"ProjectID":{"name":"project_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your project identifier (e.g. `prj-abc123`). Found in the NeevAI console under Project Settings, or returned when you create a project via the Tenant API.\n"}},"schemas":{"AIModelListResponse":{"type":"object","properties":{"items":{"type":"array","items":{"$ref":"#/components/schemas/AIModelResponse"}},"pagination":{"$ref":"#/components/schemas/Pagination"}}},"AIModelResponse":{"type":"object","description":"Current state and configuration of a Dedicated Model Deployment.","properties":{"id":{"type":"string","description":"System-generated unique identifier for the deployment."},"name":{"type":"string","description":"The user-defined name of the deployment."},"project_id":{"type":"string","description":"The project this deployment belongs to."},"model_id":{"type":"string","description":"The model's slash HuggingFace path (`organization/model`) that this deployment serves under — matches the catalog item's `model_id`. Note: the deploy request takes the catalog's url-safe `id`, while this response echoes the resolved slash path.\n"},"model_name":{"type":"string","description":"Human-readable catalog display name for the model (the catalog `name`), so clients can render a friendly label without a per-row round-trip to the catalog API.\n"},"region":{"type":"string","description":"The region where the deployment is running."},"status":{"type":"string","enum":["initializing","ready","failed","terminating"],"description":"Current lifecycle state of the deployment.\n- `initializing` — GPU is being provisioned and the model is loading. - `ready` — The model is loaded and `endpoint_url` is accepting requests. - `failed` — Deployment failed. Check logs or contact support. - `terminating` — Deletion is in progress; resources are being released.\n"},"endpoint_url":{"type":"string","nullable":true,"description":"The private OpenAI-compatible inference endpoint URL. This is `null` until `status` is `ready`. Send inference requests to `{endpoint_url}/chat/completions` using the OpenAI SDK or any compatible HTTP client.\n"},"gpu_config_id":{"type":"string","description":"The GPU configuration ID used for this deployment."},"gpu_count":{"type":"integer","description":"Number of GPUs allocated per replica."},"replicas":{"type":"integer","description":"Current number of running replicas."},"created_at":{"type":"string","format":"date-time","description":"Timestamp when the deployment was created (RFC 3339)."},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of the most recent state change (RFC 3339)."},"uptime_seconds":{"type":"integer","format":"int64","minimum":0,"description":"Seconds the deployment has been serving in its current `ready` session, i.e. elapsed time since it most recently became `ready`. Excludes time spent `initializing` or `failed`, so it reflects real serving uptime rather than `now - created_at`. `0` when the deployment is not currently `ready` (or has never become ready). A `ready -> failed -> ready` recovery restarts the count from the recovery.\n"}}},"Pagination":{"type":"object","properties":{"total_items":{"type":"integer"},"total_pages":{"type":"integer"},"current_page":{"type":"integer"},"items_per_page":{"type":"integer"}}},"ErrorResponse":{"type":"object","required":["code","message"],"properties":{"code":{"type":"string","description":"A machine-readable error code. Common values: `invalid_request`, `unauthorized`, `forbidden`, `not_found`, `internal_error`.\n"},"message":{"type":"string","description":"A human-readable description of what went wrong."}}}},"responses":{"Unauthorized":{"description":"The request is missing a valid Bearer token.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"Forbidden":{"description":"The authenticated user does not have permission to perform this action.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"InternalServerError":{"description":"An unexpected error occurred on the server. Please retry or contact support.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/api/v1beta1/orgs/{org_id}/projects/{project_id}/aimodels":{"get":{"tags":["AI Model"],"summary":"List model deployments","description":"Retrieve a paginated list of all Dedicated Model Deployments in a project.","operationId":"listDeployedAIModels","parameters":[{"$ref":"#/components/parameters/OrgID"},{"$ref":"#/components/parameters/ProjectID"},{"name":"page","in":"query","schema":{"type":"integer","minimum":1,"default":1},"description":"Page number for pagination (starts from 1)."},{"name":"limit","in":"query","schema":{"type":"integer","minimum":1,"maximum":100,"default":20},"description":"Number of deployments to return per page (max 100)."}],"responses":{"200":{"description":"Paginated list of model deployments.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/AIModelListResponse"}}}},"401":{"$ref":"#/components/responses/Unauthorized"},"403":{"$ref":"#/components/responses/Forbidden"},"500":{"$ref":"#/components/responses/InternalServerError"}}}}}}
```

## Deploy a model

> Provisions a dedicated inference endpoint for the specified model on GPU hardware.\
> \
> \*\*Steps before calling this endpoint:\*\*\
> \
> 1\. Get a \`catalog\_id\` (the url-safe \`id\` field) from \`GET /api/v1beta1/aimodels-catalog\` (e.g. \`meta-llama-Llama-3.1-8B-Instruct\`).\
> 2\. Get a \`gpu\_config\_id\` from \`GET /api/v1beta1/inventory\` — use the \`config\_id\` field of the inventory item that matches your desired GPU type and region.\
> \
> The deployment starts asynchronously. Poll \`GET .../aimodels/{model\_id}\` (the \*\*deployment\*\* id) until \`status\` is \`ready\`.<br>

```json
{"openapi":"3.0.3","info":{"title":"Dedicated Model Deployment API","version":"0.1.0"},"tags":[{"name":"AI Model","description":"Create, list, inspect, scale, and delete **Dedicated Model Deployments**. A deployment runs a specific AI model on reserved GPU hardware within your project. Once `status` reaches `ready`, the `endpoint_url` field contains your private OpenAI-compatible inference endpoint.\n"}],"servers":[{"url":"https://api.ai.neevcloud.com/aimodels","description":"Consolidated public API gateway"}],"security":[{"BearerAuth":[]}],"components":{"securitySchemes":{"BearerAuth":{"type":"http","scheme":"bearer","description":"Obtain an **`access_token`** from `POST /api/v1/auth/login` on the tenant API (same credentials as the console). In Authorize, paste **only that token** — do not prepend `Bearer`, and do not use inference keys (`sk-nc-*`).\n"}},"parameters":{"OrgID":{"name":"org_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your organization identifier (e.g. `org-abc123`). Found in the NeevAI console under Organization Settings.\n"},"ProjectID":{"name":"project_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your project identifier (e.g. `prj-abc123`). Found in the NeevAI console under Project Settings, or returned when you create a project via the Tenant API.\n"}},"schemas":{"CreateAIModelRequest":{"type":"object","description":"Request body for creating a Dedicated Model Deployment.","required":["name","catalog_id","region","gpu_config_id","gpu_count","plan_id"],"properties":{"name":{"type":"string","minLength":3,"maxLength":30,"pattern":"^[a-z0-9]([-a-z0-9]*[a-z0-9])?$","description":"A unique name for this deployment within the project. Must be lowercase, may contain hyphens and digits, and must start and end with an alphanumeric character (DNS-1123 label format). Capped at 30 characters so it fits as a DNS label in the public inference hostname `<name>.<project>.<org>.<region>.<env>.inference.ai.neevcloud.com`.\n"},"catalog_id":{"type":"string","pattern":"^[a-zA-Z0-9_\\-\\.]+$","description":"The catalog model's url-safe `id` (no slash), as returned in the `id` field by the catalog API (`GET /api/v1beta1/aimodels-catalog`). The model must exist in the catalog. Example: `meta-llama-Llama-3.1-8B-Instruct`. (The slash HuggingFace path is exposed separately as `model_id` on each catalog item and is echoed back as `model_id` in this deployment's response — that is the value the deployment serves under.)\n"},"region":{"type":"string","description":"The deployment region identifier. Must match a region where the requested GPU is available (see `GET /api/v1beta1/inventory` for live availability).\n"},"gpu_config_id":{"type":"string","description":"The GPU hardware configuration ID that determines which GPU type and memory size is used. Obtain this from the `config_id` field in `GET /api/v1beta1/inventory`, or from `recommended_gpu.gpu_config_id` in the catalog response.\n"},"gpu_count":{"type":"integer","minimum":1,"default":1,"description":"Number of GPUs to allocate per replica. Use more GPUs for larger models that require tensor parallelism (e.g. 4 GPUs for a 70B model). Must not exceed `max_gpu_count` in inventory.\n"},"plan_id":{"type":"string","description":"Billing price plan ID for this deployment. Must reference an active `ondemand` plan in the billing service whose underlying GPU SKU matches the chosen `gpu_config_id` and is available in the selected `region`. Per-minute billing is `unit_price × gpu_count × replicas / 60`.\n"},"config":{"type":"object","additionalProperties":true,"description":"Serving configuration for the inference server. Only the keys below are honored; any other keys are stored but have no effect.\nTunable serving parameters (integers) are validated against the model's advertised range from the catalog `configurations` field; values outside `[min, max]` are rejected with 400. Unset parameters fall back to the catalog `default`.\n- `max_model_len` (integer) — Maximum context length in tokens. Bounded by the catalog range. - `max_num_seqs` (integer) — Max concurrent sequences in the running batch. Bounded by the catalog range. - `extra_args` (array of strings) — Escape hatch for additional vLLM CLI flags, passed through\n\n\n\n  verbatim (e.g. `[\"--quantization\", \"awq\"]`). Tensor-parallel size is derived automatically\n  from `gpu_count`; override it here only if needed.\n"}}},"AIModelResponse":{"type":"object","description":"Current state and configuration of a Dedicated Model Deployment.","properties":{"id":{"type":"string","description":"System-generated unique identifier for the deployment."},"name":{"type":"string","description":"The user-defined name of the deployment."},"project_id":{"type":"string","description":"The project this deployment belongs to."},"model_id":{"type":"string","description":"The model's slash HuggingFace path (`organization/model`) that this deployment serves under — matches the catalog item's `model_id`. Note: the deploy request takes the catalog's url-safe `id`, while this response echoes the resolved slash path.\n"},"model_name":{"type":"string","description":"Human-readable catalog display name for the model (the catalog `name`), so clients can render a friendly label without a per-row round-trip to the catalog API.\n"},"region":{"type":"string","description":"The region where the deployment is running."},"status":{"type":"string","enum":["initializing","ready","failed","terminating"],"description":"Current lifecycle state of the deployment.\n- `initializing` — GPU is being provisioned and the model is loading. - `ready` — The model is loaded and `endpoint_url` is accepting requests. - `failed` — Deployment failed. Check logs or contact support. - `terminating` — Deletion is in progress; resources are being released.\n"},"endpoint_url":{"type":"string","nullable":true,"description":"The private OpenAI-compatible inference endpoint URL. This is `null` until `status` is `ready`. Send inference requests to `{endpoint_url}/chat/completions` using the OpenAI SDK or any compatible HTTP client.\n"},"gpu_config_id":{"type":"string","description":"The GPU configuration ID used for this deployment."},"gpu_count":{"type":"integer","description":"Number of GPUs allocated per replica."},"replicas":{"type":"integer","description":"Current number of running replicas."},"created_at":{"type":"string","format":"date-time","description":"Timestamp when the deployment was created (RFC 3339)."},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of the most recent state change (RFC 3339)."},"uptime_seconds":{"type":"integer","format":"int64","minimum":0,"description":"Seconds the deployment has been serving in its current `ready` session, i.e. elapsed time since it most recently became `ready`. Excludes time spent `initializing` or `failed`, so it reflects real serving uptime rather than `now - created_at`. `0` when the deployment is not currently `ready` (or has never become ready). A `ready -> failed -> ready` recovery restarts the count from the recovery.\n"}}},"ErrorResponse":{"type":"object","required":["code","message"],"properties":{"code":{"type":"string","description":"A machine-readable error code. Common values: `invalid_request`, `unauthorized`, `forbidden`, `not_found`, `internal_error`.\n"},"message":{"type":"string","description":"A human-readable description of what went wrong."}}}},"responses":{"BadRequest":{"description":"The request payload or parameters are invalid.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"Unauthorized":{"description":"The request is missing a valid Bearer token.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"Forbidden":{"description":"The authenticated user does not have permission to perform this action.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"InternalServerError":{"description":"An unexpected error occurred on the server. Please retry or contact support.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/api/v1beta1/orgs/{org_id}/projects/{project_id}/aimodels":{"post":{"tags":["AI Model"],"summary":"Deploy a model","description":"Provisions a dedicated inference endpoint for the specified model on GPU hardware.\n\n**Steps before calling this endpoint:**\n\n1. Get a `catalog_id` (the url-safe `id` field) from `GET /api/v1beta1/aimodels-catalog` (e.g. `meta-llama-Llama-3.1-8B-Instruct`).\n2. Get a `gpu_config_id` from `GET /api/v1beta1/inventory` — use the `config_id` field of the inventory item that matches your desired GPU type and region.\n\nThe deployment starts asynchronously. Poll `GET .../aimodels/{model_id}` (the **deployment** id) until `status` is `ready`.\n","operationId":"deployAIModel","parameters":[{"$ref":"#/components/parameters/OrgID"},{"$ref":"#/components/parameters/ProjectID"}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/CreateAIModelRequest"}}}},"responses":{"201":{"description":"Deployment accepted and provisioning has started.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/AIModelResponse"}}}},"400":{"$ref":"#/components/responses/BadRequest"},"401":{"$ref":"#/components/responses/Unauthorized"},"403":{"$ref":"#/components/responses/Forbidden"},"500":{"$ref":"#/components/responses/InternalServerError"}}}}}}
```

## Get deployment details

> Retrieve the current state of a Dedicated Model Deployment. Use this to poll until \`status\` becomes \`ready\`, at which point \`endpoint\_url\` is populated and ready to serve inference requests.<br>

```json
{"openapi":"3.0.3","info":{"title":"Dedicated Model Deployment API","version":"0.1.0"},"tags":[{"name":"AI Model","description":"Create, list, inspect, scale, and delete **Dedicated Model Deployments**. A deployment runs a specific AI model on reserved GPU hardware within your project. Once `status` reaches `ready`, the `endpoint_url` field contains your private OpenAI-compatible inference endpoint.\n"}],"servers":[{"url":"https://api.ai.neevcloud.com/aimodels","description":"Consolidated public API gateway"}],"security":[{"BearerAuth":[]}],"components":{"securitySchemes":{"BearerAuth":{"type":"http","scheme":"bearer","description":"Obtain an **`access_token`** from `POST /api/v1/auth/login` on the tenant API (same credentials as the console). In Authorize, paste **only that token** — do not prepend `Bearer`, and do not use inference keys (`sk-nc-*`).\n"}},"parameters":{"OrgID":{"name":"org_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your organization identifier (e.g. `org-abc123`). Found in the NeevAI console under Organization Settings.\n"},"ProjectID":{"name":"project_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your project identifier (e.g. `prj-abc123`). Found in the NeevAI console under Project Settings, or returned when you create a project via the Tenant API.\n"},"ModelID":{"name":"model_id","in":"path","required":true,"schema":{"type":"string"},"description":"The deployment ID returned in the `id` field of a previous `POST /aimodels` or `GET /aimodels` response. Note this is the **deployment** ID (e.g. `dep-abc123`), not the catalog model ID.\n"}},"schemas":{"AIModelResponse":{"type":"object","description":"Current state and configuration of a Dedicated Model Deployment.","properties":{"id":{"type":"string","description":"System-generated unique identifier for the deployment."},"name":{"type":"string","description":"The user-defined name of the deployment."},"project_id":{"type":"string","description":"The project this deployment belongs to."},"model_id":{"type":"string","description":"The model's slash HuggingFace path (`organization/model`) that this deployment serves under — matches the catalog item's `model_id`. Note: the deploy request takes the catalog's url-safe `id`, while this response echoes the resolved slash path.\n"},"model_name":{"type":"string","description":"Human-readable catalog display name for the model (the catalog `name`), so clients can render a friendly label without a per-row round-trip to the catalog API.\n"},"region":{"type":"string","description":"The region where the deployment is running."},"status":{"type":"string","enum":["initializing","ready","failed","terminating"],"description":"Current lifecycle state of the deployment.\n- `initializing` — GPU is being provisioned and the model is loading. - `ready` — The model is loaded and `endpoint_url` is accepting requests. - `failed` — Deployment failed. Check logs or contact support. - `terminating` — Deletion is in progress; resources are being released.\n"},"endpoint_url":{"type":"string","nullable":true,"description":"The private OpenAI-compatible inference endpoint URL. This is `null` until `status` is `ready`. Send inference requests to `{endpoint_url}/chat/completions` using the OpenAI SDK or any compatible HTTP client.\n"},"gpu_config_id":{"type":"string","description":"The GPU configuration ID used for this deployment."},"gpu_count":{"type":"integer","description":"Number of GPUs allocated per replica."},"replicas":{"type":"integer","description":"Current number of running replicas."},"created_at":{"type":"string","format":"date-time","description":"Timestamp when the deployment was created (RFC 3339)."},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of the most recent state change (RFC 3339)."},"uptime_seconds":{"type":"integer","format":"int64","minimum":0,"description":"Seconds the deployment has been serving in its current `ready` session, i.e. elapsed time since it most recently became `ready`. Excludes time spent `initializing` or `failed`, so it reflects real serving uptime rather than `now - created_at`. `0` when the deployment is not currently `ready` (or has never become ready). A `ready -> failed -> ready` recovery restarts the count from the recovery.\n"}}},"ErrorResponse":{"type":"object","required":["code","message"],"properties":{"code":{"type":"string","description":"A machine-readable error code. Common values: `invalid_request`, `unauthorized`, `forbidden`, `not_found`, `internal_error`.\n"},"message":{"type":"string","description":"A human-readable description of what went wrong."}}}},"responses":{"BadRequest":{"description":"The request payload or parameters are invalid.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"Unauthorized":{"description":"The request is missing a valid Bearer token.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"Forbidden":{"description":"The authenticated user does not have permission to perform this action.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFound":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"InternalServerError":{"description":"An unexpected error occurred on the server. Please retry or contact support.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/api/v1beta1/orgs/{org_id}/projects/{project_id}/aimodels/{model_id}":{"get":{"tags":["AI Model"],"summary":"Get deployment details","description":"Retrieve the current state of a Dedicated Model Deployment. Use this to poll until `status` becomes `ready`, at which point `endpoint_url` is populated and ready to serve inference requests.\n","operationId":"getDeployedAIModel","parameters":[{"$ref":"#/components/parameters/OrgID"},{"$ref":"#/components/parameters/ProjectID"},{"$ref":"#/components/parameters/ModelID"}],"responses":{"200":{"description":"Current state of the model deployment.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/AIModelResponse"}}}},"400":{"$ref":"#/components/responses/BadRequest"},"401":{"$ref":"#/components/responses/Unauthorized"},"403":{"$ref":"#/components/responses/Forbidden"},"404":{"$ref":"#/components/responses/NotFound"},"500":{"$ref":"#/components/responses/InternalServerError"}}}}}}
```

## Delete a deployment

> Terminates the model deployment and releases the associated GPU resources. This action is irreversible. Any in-flight inference requests will be interrupted. Billing stops once the deployment is fully terminated.<br>

```json
{"openapi":"3.0.3","info":{"title":"Dedicated Model Deployment API","version":"0.1.0"},"tags":[{"name":"AI Model","description":"Create, list, inspect, scale, and delete **Dedicated Model Deployments**. A deployment runs a specific AI model on reserved GPU hardware within your project. Once `status` reaches `ready`, the `endpoint_url` field contains your private OpenAI-compatible inference endpoint.\n"}],"servers":[{"url":"https://api.ai.neevcloud.com/aimodels","description":"Consolidated public API gateway"}],"security":[{"BearerAuth":[]}],"components":{"securitySchemes":{"BearerAuth":{"type":"http","scheme":"bearer","description":"Obtain an **`access_token`** from `POST /api/v1/auth/login` on the tenant API (same credentials as the console). In Authorize, paste **only that token** — do not prepend `Bearer`, and do not use inference keys (`sk-nc-*`).\n"}},"parameters":{"OrgID":{"name":"org_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your organization identifier (e.g. `org-abc123`). Found in the NeevAI console under Organization Settings.\n"},"ProjectID":{"name":"project_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your project identifier (e.g. `prj-abc123`). Found in the NeevAI console under Project Settings, or returned when you create a project via the Tenant API.\n"},"ModelID":{"name":"model_id","in":"path","required":true,"schema":{"type":"string"},"description":"The deployment ID returned in the `id` field of a previous `POST /aimodels` or `GET /aimodels` response. Note this is the **deployment** ID (e.g. `dep-abc123`), not the catalog model ID.\n"}},"responses":{"BadRequest":{"description":"The request payload or parameters are invalid.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"Unauthorized":{"description":"The request is missing a valid Bearer token.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"Forbidden":{"description":"The authenticated user does not have permission to perform this action.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFound":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"InternalServerError":{"description":"An unexpected error occurred on the server. Please retry or contact support.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}},"schemas":{"ErrorResponse":{"type":"object","required":["code","message"],"properties":{"code":{"type":"string","description":"A machine-readable error code. Common values: `invalid_request`, `unauthorized`, `forbidden`, `not_found`, `internal_error`.\n"},"message":{"type":"string","description":"A human-readable description of what went wrong."}}}}},"paths":{"/api/v1beta1/orgs/{org_id}/projects/{project_id}/aimodels/{model_id}":{"delete":{"tags":["AI Model"],"summary":"Delete a deployment","description":"Terminates the model deployment and releases the associated GPU resources. This action is irreversible. Any in-flight inference requests will be interrupted. Billing stops once the deployment is fully terminated.\n","operationId":"deleteDeployedAIModel","parameters":[{"$ref":"#/components/parameters/OrgID"},{"$ref":"#/components/parameters/ProjectID"},{"$ref":"#/components/parameters/ModelID"}],"responses":{"204":{"description":"Deployment deletion accepted. Resources will be released within a few minutes."},"400":{"$ref":"#/components/responses/BadRequest"},"401":{"$ref":"#/components/responses/Unauthorized"},"403":{"$ref":"#/components/responses/Forbidden"},"404":{"$ref":"#/components/responses/NotFound"},"500":{"$ref":"#/components/responses/InternalServerError"}}}}}}
```

## Scale replicas

> Change the number of running replicas for a deployment. Each replica is an independent copy of the model loaded on its own set of GPUs, enabling higher throughput for concurrent requests.\
> \
> Scaling is applied asynchronously — the deployment will briefly enter a transitional state before returning to \`ready\`.<br>

```json
{"openapi":"3.0.3","info":{"title":"Dedicated Model Deployment API","version":"0.1.0"},"tags":[{"name":"AI Model","description":"Create, list, inspect, scale, and delete **Dedicated Model Deployments**. A deployment runs a specific AI model on reserved GPU hardware within your project. Once `status` reaches `ready`, the `endpoint_url` field contains your private OpenAI-compatible inference endpoint.\n"}],"servers":[{"url":"https://api.ai.neevcloud.com/aimodels","description":"Consolidated public API gateway"}],"security":[{"BearerAuth":[]}],"components":{"securitySchemes":{"BearerAuth":{"type":"http","scheme":"bearer","description":"Obtain an **`access_token`** from `POST /api/v1/auth/login` on the tenant API (same credentials as the console). In Authorize, paste **only that token** — do not prepend `Bearer`, and do not use inference keys (`sk-nc-*`).\n"}},"parameters":{"OrgID":{"name":"org_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your organization identifier (e.g. `org-abc123`). Found in the NeevAI console under Organization Settings.\n"},"ProjectID":{"name":"project_id","in":"path","required":true,"schema":{"type":"string"},"description":"Your project identifier (e.g. `prj-abc123`). Found in the NeevAI console under Project Settings, or returned when you create a project via the Tenant API.\n"},"ModelID":{"name":"model_id","in":"path","required":true,"schema":{"type":"string"},"description":"The deployment ID returned in the `id` field of a previous `POST /aimodels` or `GET /aimodels` response. Note this is the **deployment** ID (e.g. `dep-abc123`), not the catalog model ID.\n"}},"schemas":{"ScaleAIModelRequest":{"type":"object","description":"Request body for scaling the number of replicas in a deployment.","required":["replicas"],"properties":{"replicas":{"type":"integer","minimum":1,"description":"The desired number of running replicas. Each replica is an independent model instance with its own GPU allocation, enabling higher concurrency. Billing scales linearly with the number of replicas.\n"}}},"AIModelResponse":{"type":"object","description":"Current state and configuration of a Dedicated Model Deployment.","properties":{"id":{"type":"string","description":"System-generated unique identifier for the deployment."},"name":{"type":"string","description":"The user-defined name of the deployment."},"project_id":{"type":"string","description":"The project this deployment belongs to."},"model_id":{"type":"string","description":"The model's slash HuggingFace path (`organization/model`) that this deployment serves under — matches the catalog item's `model_id`. Note: the deploy request takes the catalog's url-safe `id`, while this response echoes the resolved slash path.\n"},"model_name":{"type":"string","description":"Human-readable catalog display name for the model (the catalog `name`), so clients can render a friendly label without a per-row round-trip to the catalog API.\n"},"region":{"type":"string","description":"The region where the deployment is running."},"status":{"type":"string","enum":["initializing","ready","failed","terminating"],"description":"Current lifecycle state of the deployment.\n- `initializing` — GPU is being provisioned and the model is loading. - `ready` — The model is loaded and `endpoint_url` is accepting requests. - `failed` — Deployment failed. Check logs or contact support. - `terminating` — Deletion is in progress; resources are being released.\n"},"endpoint_url":{"type":"string","nullable":true,"description":"The private OpenAI-compatible inference endpoint URL. This is `null` until `status` is `ready`. Send inference requests to `{endpoint_url}/chat/completions` using the OpenAI SDK or any compatible HTTP client.\n"},"gpu_config_id":{"type":"string","description":"The GPU configuration ID used for this deployment."},"gpu_count":{"type":"integer","description":"Number of GPUs allocated per replica."},"replicas":{"type":"integer","description":"Current number of running replicas."},"created_at":{"type":"string","format":"date-time","description":"Timestamp when the deployment was created (RFC 3339)."},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of the most recent state change (RFC 3339)."},"uptime_seconds":{"type":"integer","format":"int64","minimum":0,"description":"Seconds the deployment has been serving in its current `ready` session, i.e. elapsed time since it most recently became `ready`. Excludes time spent `initializing` or `failed`, so it reflects real serving uptime rather than `now - created_at`. `0` when the deployment is not currently `ready` (or has never become ready). A `ready -> failed -> ready` recovery restarts the count from the recovery.\n"}}},"ErrorResponse":{"type":"object","required":["code","message"],"properties":{"code":{"type":"string","description":"A machine-readable error code. Common values: `invalid_request`, `unauthorized`, `forbidden`, `not_found`, `internal_error`.\n"},"message":{"type":"string","description":"A human-readable description of what went wrong."}}}},"responses":{"BadRequest":{"description":"The request payload or parameters are invalid.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"Unauthorized":{"description":"The request is missing a valid Bearer token.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"Forbidden":{"description":"The authenticated user does not have permission to perform this action.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFound":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"InternalServerError":{"description":"An unexpected error occurred on the server. Please retry or contact support.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/api/v1beta1/orgs/{org_id}/projects/{project_id}/aimodels/{model_id}/scale":{"put":{"tags":["AI Model"],"summary":"Scale replicas","description":"Change the number of running replicas for a deployment. Each replica is an independent copy of the model loaded on its own set of GPUs, enabling higher throughput for concurrent requests.\n\nScaling is applied asynchronously — the deployment will briefly enter a transitional state before returning to `ready`.\n","operationId":"scaleDeployedAIModel","parameters":[{"$ref":"#/components/parameters/OrgID"},{"$ref":"#/components/parameters/ProjectID"},{"$ref":"#/components/parameters/ModelID"}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ScaleAIModelRequest"}}}},"responses":{"200":{"description":"Scale request accepted. The deployment will update to the desired replica count.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/AIModelResponse"}}}},"400":{"$ref":"#/components/responses/BadRequest"},"401":{"$ref":"#/components/responses/Unauthorized"},"403":{"$ref":"#/components/responses/Forbidden"},"404":{"$ref":"#/components/responses/NotFound"},"500":{"$ref":"#/components/responses/InternalServerError"}}}}}}
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ai.neevcloud.com/api-reference/dedicated-models/ai-model.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.