AI Endpoints - Responses API
AI Endpoints is covered by the OVHcloud AI Endpoints Conditions and the OVHcloud Public Cloud Special Conditions.
Introduction
AI Endpoints is a serverless platform provided by OVHcloud that offers easy access to a selection of world-renowned, pre-trained AI models.
The Responses API (/v1/responses) is the most recent OpenAI-compatible route.
Like v1/chat/completions, it can be used for text generation, multi-turn conversations, tool/function calling, structured outputs, and vision inputs (on compatible models).
The key difference is that /v1/responses is intended as the foundation for newer capabilities and agentic behaviour, introducing advanced features such as statefulness and built-in tools.
The v1/responses route was added recently. Some parameters and behaviours may differ between models.
For up-to-date limitations, refer to Endpoint Limitations and check model capabilities in the Catalog.
Objective
This documentation provides an overview of the v1/responses route on AI Endpoints, including:
- Basic requests and common response fields
- Usage examples in Python, JavaScript, and cURL
- A detailed explanation of the most important parameters
- Known limitations on the platform
Requirements
The examples provided during this guide can be used with one of the following environments:
A Python environment with the openai client.
A standard terminal, with cURL installed on the system.
Authentication & Rate Limiting
Most examples provided in this guide are authenticated and expect the AI_ENDPOINT_API_KEY to be set in order to avoid rate limiting issues.
If you wish to enable authentication using your own token, specify your own API key in the environment (export AI_ENDPOINT_API_KEY='your_api_key').
Follow the instructions in the AI Endpoints - Getting Started guide for more information on authentication.
Quickstart
On AI Endpoints, statefulness for v1/responses is currently not managed.
To avoid unexpected behaviour and to match the current platform implementation, always send store: false.
Basic request (text input)
The simplest request is a single text input.
Multi-turn conversations
To create a multi-turn conversation, keep the full conversation history on your side and send it as an input list at each request.
On AI Endpoints, statefulness for v1/responses is currently unavailable.
This means you must always send the full history as part of input.
Client-managed conversation (input list)
Providing a system prompt
You can provide system-level instructions in two ways:
instructions(simple and compact)- A
role: "system"item inside aninputlist (useful when you already send a list for multi-turn)
Option 1: instructions
Option 2: role: "system" in an input list
Streaming (stream: true)
If stream is enabled, the API returns Server-Sent Events (SSE) with incremental output.
This is useful for chat UIs and CLIs.
Structured outputs (text.format)
Some models support enforcing a structured output format. This is useful when you need predictable, machine-readable responses.
The text.format object can be used in these modes (model permitting):
-
{"type": "text"}Default textual format. -
{"type": "json_schema", "name": "...", "schema": { ... }}Schema-enforced mode: the model returns JSON that matches your JSON Schema.
Example: JSON schema extraction
Function calling (tools)
Function calling (tool calling) lets the model request that your application runs a function.
You declare the function signature in tools, the model may emit tool calls, then you execute them and provide the results back so the model can produce a final answer.
On OVHcloud AI Endpoints for v1/responses, built-in tools are not supported (e.g. web_search, file_search, computer_use, code_execution, ...).
Only custom function tools are supported.
End-to-end workflow (recommended)
The flow is similar to the v1/chat/completions function calling guide:
- Call the model with
tools. - If the model returns a tool call: execute the tool in your application.
- Send a follow-up request that includes the tool result in
input, then read the final answer.
Below is a minimal end-to-end example.
cURL is convenient to declare tools, but executing tools and sending tool results back requires application-side logic.
Vision language models (image inputs)
Some models accept image inputs.
When supported, you can pass an input array containing a mix of text and image parts.
OVHcloud AI Endpoints currently does not support fetching images from remote URLs for input_image.
Provide images as a base64-encoded data URL (for example: data:image/png;base64,...).
Image inputs are supported only by vision-capable models. Refer to the Catalog and model pages for supported content types.
Reasoning models (reasoning)
Some models expose reasoning-related controls.
When supported, a reasoning object can be used to tune the reasoning effort and/or retrieve reasoning metadata.
Reasoning parameters are model-specific. If you get validation errors, either remove reasoning or switch to a reasoning-capable model.
Endpoint limitations
The v1/responses endpoint is still undergoing development and all features may not be available.
If you are interested in specific features that would like us to prioritise, don't hesitate to let us know on the OVHcloud Discord server.
Statefulness
Statefulness is currently not managed on AI Endpoints for the v1/responses route.
- Always send
store: falseto avoid unexpected behaviour (the OpenAI specification defaults tostore: true). previous_response_idis currently not supported.- To implement multi-turn, send the full history in the
inputlist.
Built-in tools
OpenAI-compatible built-in tools are currently not supported on OVHcloud AI Endpoints for v1/responses (for example: web_search, file_search, computer_use, code_execution, remote tools with type: "mcp", etc.).
If you need tool calling, only custom function tools are supported: declare them explicitly in the tools array (see Function calling (tools)).
Known issues / unsupported parameters
The following parameters may be unsupported, ignored, or inconsistently implemented depending on the model/backend:
- Reasoning summaries and some reasoning metadata fields
backgroundincludemax_tool_callsprompt_cache_keytruncation- Reusable prompts (
promptparameter) safety_identifierservice_tierstream_optionsuserverbosity
Model-specific limitations you may encounter:
- Some models are not compatible with the
v1/responsesroute - JSON object / JSON schema support varies (structured outputs)
- Tool calling may be unsupported, or
tool_choicevalues may be restricted (for example: not supporting non-automodes) - Some models do not support system prompts /
instructions - Multi-turn conversations may behave unexpectedly when combining structured outputs, system instructions, or reasoning parameters
- Structured outputs with streaming may be unsupported
logprobsmay not be supported on some models- Parallel tool calls may be unsupported on some models
- Image inputs are supported only by vision-capable models
Conclusion
The Responses API provides a unified way to interact with LLMs on OVHcloud AI Endpoints, covering basic text generation as well as advanced use cases such as multi-turn conversations, streaming, structured outputs, function calling, and vision inputs (model permitting).
To maximise compatibility, always verify supported features for your chosen model in the AI Endpoints catalog, and consider falling back to v1/chat/completions when a feature is not available on v1/responses.
Go further
Browse the full AI Endpoints documentation to explore other guides and tutorials.
If you need training or technical assistance to implement our solutions, contact your sales representative or click on this link to get a quote and ask our Professional Services experts for a custom analysis of your project.
Feedback
Please send us your questions, feedback, and suggestions to improve the service:
- On the OVHcloud Discord server.