API Endpoints¶

Argo Proxy provides a universal API gateway that serves all major LLM API formats. In v3 universal mode (default), requests are automatically routed to the optimal upstream endpoint based on the model family.

Here we assume the service is running on localhost:44497. Replace with your actual service address.

Universal Endpoints (v3)¶

These endpoints are always available and support all models through automatic format translation via llm-rosetta.

`/v1/chat/completions` — OpenAI Chat¶

The primary endpoint for OpenAI Chat Completions API.

POST http://localhost:44497/v1/chat/completions

Supports all models (GPT, Claude, Gemini). Requests for Claude models are automatically translated to native Anthropic format upstream, while GPT and Gemini models use the native OpenAI-compatible upstream.

`/v1/responses` — OpenAI Responses¶

OpenAI's Responses API endpoint.

POST http://localhost:44497/v1/responses

Supports all models. Cross-format translation is handled automatically.

`/v1/messages` — Anthropic Messages¶

Native Anthropic Messages API endpoint. Use this with the Anthropic SDK, Claude Code, or any Anthropic-compatible client.

POST http://localhost:44497/v1/messages

Supports all models. Requests for non-Claude models are automatically translated to OpenAI Chat format upstream.

`/v1beta/models/{model}:generateContent` — Google GenAI¶

Google GenAI (Gemini) content generation endpoint.

POST http://localhost:44497/v1beta/models/gemini-2.5-flash:generateContent

`/v1beta/models/{model}:streamGenerateContent` — Google GenAI (Streaming)¶

Google GenAI streaming endpoint.

POST http://localhost:44497/v1beta/models/gemini-2.5-flash:streamGenerateContent

`/v1/embeddings` — Embeddings¶

OpenAI-compatible embedding API. Passed through to the native OpenAI endpoint.

POST http://localhost:44497/v1/embeddings

`/v1/models` — Model List¶

Lists available models in OpenAI-compatible format.

GET http://localhost:44497/v1/models

Response: Returns a list of available chat and embedding models with their aliases.

Routing Logic¶

In universal mode, argo-proxy routes requests to the optimal upstream based on the model family:

Model Family	Upstream	Reason
OpenAI (GPT)	OpenAI Chat endpoint	Natural fit
Google (Gemini)	OpenAI Chat endpoint	Only option on ARGO
Anthropic (Claude)	Anthropic native endpoint	Avoids tool call issues on OpenAI-compat
Unknown	OpenAI Chat endpoint	Best-effort default

When the client format matches the upstream format (e.g., OpenAI client + GPT model), requests pass through directly without conversion. When formats differ (e.g., Anthropic client + GPT model), llm-rosetta handles the translation.

Upstream Authentication¶

The ARGO backend identifies users through different fields depending on the upstream endpoint format:

Upstream Format	Primary Auth Field	Fallback	Location
OpenAI (Chat, Responses, Embeddings)	`user`	`Authorization: Bearer`	Body / Header
Anthropic (Messages)	`x-api-key`	`Authorization: Bearer`	Header
Legacy ARGO (Chat, StreamChat)	`user`	—	Body

Argo-proxy automatically populates these fields using the user value from your configuration. When --username-passthrough is enabled, the API key provided by the downstream client is used instead.

Legacy Endpoints¶

These endpoints are only available when legacy mode is enabled (--legacy-argo or use_legacy_argo: true).

`/v1/chat`¶

Proxies requests directly to the legacy ARGO Chat API without format conversion.

POST http://localhost:44497/v1/chat

`/v1/embed`¶

Proxies requests directly to the legacy ARGO Embedding API.

POST http://localhost:44497/v1/embed

`/v1/completions`¶

Legacy Completions API (text completion). Only available in legacy mode.

POST http://localhost:44497/v1/completions

Utility Endpoints¶

`/health`¶

Health check endpoint for monitoring and load balancing.

GET http://localhost:44497/health

Response: Returns 200 OK with {"status": "healthy"} if the server is running.

`/version`¶

Returns version information, update availability, and dependency status.

GET http://localhost:44497/version

Response:

{
    "version": "3.0.0",
    "latest_stable": "3.0.0",
    "latest_pre": null,
    "up_to_date": true,
    "message": "You're using the latest version",
    "update_commands": null,
    "dependencies": {
        "llm-rosetta": {
            "installed": "0.5.1",
            "latest_stable": "0.5.1",
            "latest_pre": null,
            "up_to_date": true,
            "update_command": "pip install --upgrade llm-rosetta"
        }
    },
    "pypi": "https://pypi.org/project/argo-proxy/",
    "changelog": "https://argo-proxy.readthedocs.io/en/latest/changelog/"
}

The dependencies field reports the update status of critical dependencies (currently llm-rosetta). It is null when no critical dependencies are installed. The update_commands field contains CLI and pip upgrade commands when an argo-proxy update is available.

`/refresh`¶

Reloads the model list from the upstream ARGO API without restarting.

POST http://localhost:44497/refresh

Response:

{
    "status": "ok",
    "message": "Model list refreshed successfully",
    "previous": {
        "unique_models": 20,
        "total_aliases": 45
    },
    "current": {
        "unique_models": 22,
        "total_aliases": 50,
        "chat_models": 19,
        "embed_models": 3
    }
}

API Endpoints¶

Universal Endpoints (v3)¶

/v1/chat/completions — OpenAI Chat¶

/v1/responses — OpenAI Responses¶

/v1/messages — Anthropic Messages¶

/v1beta/models/{model}:generateContent — Google GenAI¶

/v1beta/models/{model}:streamGenerateContent — Google GenAI (Streaming)¶

/v1/embeddings — Embeddings¶

/v1/models — Model List¶