Azure OpenAI image and audio REST API reference (2024-10-21)

This article documents the image generation and audio (speech) data plane inference REST API operations for Azure OpenAI in the 2024-10-21 GA release. For chat completions, embeddings, completions, and all other operations, see the official Azure OpenAI REST API reference.

API specs

Managing and interacting with Azure OpenAI models and resources is divided across three primary API surfaces:

Control plane
Data plane - authoring
Data plane - inference

Each API surface/specification encapsulates a different set of Azure OpenAI capabilities. Each API has its own unique set of preview and stable/generally available (GA) API releases. Preview releases currently tend to follow a monthly cadence.

Important

There is now a new preview inference API. Learn more in our API lifecycle guide.

API	Latest preview release	Latest GA release	Specifications	Description
Control plane	`2025-07-01-preview`	`2025-06-01`	Spec files	The control plane API is used for operations like creating resources, model deployment, and other higher level resource management tasks. The control plane also governs what is possible to do with capabilities like Azure Resource Manager, Bicep, Terraform, and Azure CLI.
Data plane	`v1 preview`	`v1`	Spec files	The data plane API controls inference and authoring operations.

Authentication

Azure OpenAI provides two methods for authentication. You can use either API Keys or Microsoft Entra ID.

API Key authentication: For this type of authentication, all API requests must include the API Key in the api-key HTTP header. The Quickstart provides guidance for how to make calls with this type of authentication.
Microsoft Entra ID authentication: You can authenticate an API call using a Microsoft Entra token. Authentication tokens are included in a request as the Authorization header. The token provided must be preceded by Bearer, for example Bearer YOUR_AUTH_TOKEN. You can read our how-to guide on authenticating with Microsoft Entra ID.

REST API versioning

The service APIs are versioned using the api-version query parameter. All versions follow the YYYY-MM-DD date structure. For example:

POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2024-06-01

Data plane inference

The rest of this article covers the image and audio operations in the GA release of the Azure OpenAI data plane inference specification, 2024-10-21.

For the preview image and audio operations, see the preview image and audio REST API reference.

Transcriptions - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2024-10-21

Transcribes audio into the input language.

URI Parameters

Name	In	Required	Type	Description
endpoint	path	Yes	string url	Supported Azure OpenAI endpoints (protocol and hostname, for example: `https://aoairesource.openai.azure.com`. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id	path	Yes	string	Deployment ID of the speech to text model. For information about supported models, see [/azure/ai-foundry/openai/concepts/models#audio-models].
api-version	query	Yes	string	API version

Request Header

Name	Required	Type	Description
api-key	True	string	Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name	Type	Description	Required	Default
file	string	The audio file object to transcribe.	Yes
prompt	string	An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.	No
response_format	audioResponseFormat	Defines the format of the output.	No
temperature	number	The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.	No	0
language	string	The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.	No

Responses

Status Code: 200

Description: OK

Content-Type	Type	Description
application/json	audioResponse or audioVerboseResponse
text/plain	string	Transcribed text in the output format (when response_format was one of text, vtt or srt).

Examples

Example

Gets transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2024-10-21

Responses: Status Code: 200

{
  "body": {
    "text": "A structured object when requesting json or verbose_json"
  }
}

Example

Gets transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2024-10-21

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200

{
  "type": "string",
  "example": "plain text when requesting text, srt, or vtt"
}

Translations - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2024-10-21

Transcribes and translates input audio into English text.

URI Parameters

Name	In	Required	Type	Description
endpoint	path	Yes	string url	Supported Azure OpenAI endpoints (protocol and hostname, for example: `https://aoairesource.openai.azure.com`. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id	path	Yes	string	Deployment ID of the whisper model which was deployed. For information about supported models, see [/azure/ai-foundry/openai/concepts/models#audio-models].
api-version	query	Yes	string	API version

Request Header

Name	Required	Type	Description
api-key	True	string	Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name	Type	Description	Required	Default
file	string	The audio file to translate.	Yes
prompt	string	An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English.	No
response_format	audioResponseFormat	Defines the format of the output.	No
temperature	number	The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.	No	0

Responses

Status Code: 200

Description: OK

Content-Type	Type	Description
application/json	audioResponse or audioVerboseResponse
text/plain	string	Transcribed text in the output format (when response_format was one of text, vtt or srt).

Examples

Example

Gets English language transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2024-10-21

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200

{
  "body": {
    "text": "A structured object when requesting json or verbose_json"
  }
}

Example

Gets English language transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2024-10-21

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200

{
  "type": "string",
  "example": "plain text when requesting text, srt, or vtt"
}

Image generation

POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2024-10-21

Generates a batch of images from a text caption on a given dall-e model deployment

URI Parameters

Name	In	Required	Type	Description
endpoint	path	Yes	string url	Supported Azure OpenAI endpoints (protocol and hostname, for example: `https://aoairesource.openai.azure.com`. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id	path	Yes	string	Deployment ID of the dall-e model which was deployed.
api-version	query	Yes	string	API version

Request Header

Name	Required	Type	Description
api-key	True	string	Provide Azure OpenAI API key here

Request Body

Content-Type: application/json

Name	Type	Description	Required	Default
prompt	string	A text description of the desired image(s). The maximum length is 4,000 characters.	Yes
n	integer	The number of images to generate.	No	1
size	imageSize	The size of the generated images.	No	1024x1024
response_format	imagesResponseFormat	The format in which the generated images are returned.	No	url
user	string	A unique identifier representing your end-user, which can help to monitor and detect abuse.	No
quality	imageQuality	The quality of the image that will be generated.	No	standard
style	imageStyle	The style of the generated images.	No	vivid

Responses

Status Code: 200

Description: Ok

Content-Type	Type	Description
application/json	generateImagesResponse

Status Code: default

Description: An error occurred.

Content-Type	Type	Description
application/json	dalleErrorResponse

Examples

Example

Creates images given a prompt.

POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2024-10-21

{
 "prompt": "In the style of WordArt, Microsoft Clippy wearing a cowboy hat.",
 "n": 1,
 "style": "natural",
 "quality": "standard"
}

Responses: Status Code: 200

{
  "body": {
    "created": 1698342300,
    "data": [
      {
        "revised_prompt": "A vivid, natural representation of Microsoft Clippy wearing a cowboy hat.",
        "prompt_filter_results": {
          "sexual": {
            "severity": "safe",
            "filtered": false
          },
          "violence": {
            "severity": "safe",
            "filtered": false
          },
          "hate": {
            "severity": "safe",
            "filtered": false
          },
          "self_harm": {
            "severity": "safe",
            "filtered": false
          },
          "profanity": {
            "detected": false,
            "filtered": false
          }
        },
        "url": "https://dalletipusw2.blob.core.windows.net/private/images/e5451cc6-b1ad-4747-bd46-b89a3a3b8bc3/generated_00.png?se=2023-10-27T17%3A45%3A09Z&...",
        "content_filter_results": {
          "sexual": {
            "severity": "safe",
            "filtered": false
          },
          "violence": {
            "severity": "safe",
            "filtered": false
          },
          "hate": {
            "severity": "safe",
            "filtered": false
          },
          "self_harm": {
            "severity": "safe",
            "filtered": false
          }
        }
      }
    ]
  }
}

Components

For the schema definitions used by chat, completions, embeddings, and other text operations, see the Azure OpenAI REST API reference. The following schemas support the image and audio operations on this page.

innerErrorCode

Error codes for the inner error object.

Description: Error codes for the inner error object.

Type: string

Default:

Enum Name: InnerErrorCode

Enum Values:

Value	Description
ResponsibleAIPolicyViolation	The prompt violated one of more content filter rules.

dalleErrorResponse

Name	Type	Description	Required	Default
error	dalleError		No

dalleError

Name	Type	Description	Required
param	string		No
type	string		No
inner_error	dalleInnerError	Inner error with additional details.	No

dalleInnerError

Inner error with additional details.

Name	Type	Description	Required
code	innerErrorCode	Error codes for the inner error object.	No
content_filter_results	dalleFilterResults	Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer blocklist, if it has been filtered and its id.	No
revised_prompt	string	The prompt that was used to generate the image, if there was any revision to the prompt.	No

contentFilterSeverityResult

Name	Type	Description	Required	Default
filtered	boolean		Yes
severity	string		No

contentFilterDetectedResult

Name	Type	Description	Required	Default
filtered	boolean		Yes
detected	boolean		No

dalleFilterResults

Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer blocklist, if it has been filtered and its id.

Name	Type	Required
sexual	contentFilterSeverityResult	No
violence	contentFilterSeverityResult	No
hate	contentFilterSeverityResult	No
self_harm	contentFilterSeverityResult	No
profanity	contentFilterDetectedResult	No
jailbreak	contentFilterDetectedResult	No

audioResponse

Translation or transcription response when response_format was json

Name	Type	Description	Required	Default
text	string	Translated or transcribed text.	Yes

audioVerboseResponse

Translation or transcription response when response_format was verbose_json

Name	Type	Description	Required
text	string	Translated or transcribed text.	Yes
task	string	Type of audio task.	No
language	string	Language.	No
duration	number	Duration.	No
segments	array		No

audioResponseFormat

Defines the format of the output.

Description: Defines the format of the output.

Type: string

Default:

Enum Values:

json
text
srt
verbose_json
vtt

imageQuality

The quality of the image that will be generated.

Description: The quality of the image that will be generated.

Type: string

Default: standard

Enum Name: Quality

Enum Values:

Value	Description
standard	Standard quality creates images with standard quality.
hd	HD quality creates images with finer details and greater consistency across the image.

imagesResponseFormat

The format in which the generated images are returned.

Description: The format in which the generated images are returned.

Type: string

Default: url

Enum Name: ImagesResponseFormat

Enum Values:

Value	Description
url	The URL that provides temporary access to download the generated images.
b64_json	The generated images are returned as base64 encoded string.

imageSize

The size of the generated images.

Description: The size of the generated images.

Type: string

Default: 1024x1024

Enum Name: Size

Enum Values:

Value	Description
1792x1024	The desired size of the generated image is 1792x1024 pixels.
1024x1792	The desired size of the generated image is 1024x1792 pixels.
1024x1024	The desired size of the generated image is 1024x1024 pixels.

imageStyle

The style of the generated images.

Description: The style of the generated images.

Type: string

Default: vivid

Enum Name: Style

Enum Values:

Value	Description
vivid	Vivid creates images that are hyper-realistic and dramatic.
natural	Natural creates images that are more natural and less hyper-realistic.

generateImagesResponse

Name	Type	Description	Required	Default
created	integer	The unix timestamp when the operation was created.	Yes
data	array	The result data of the operation, if successful	Yes

Next steps

Learn about models and fine-tuning with the REST API. Learn more about the underlying models that power Azure OpenAI.

Feedback

Was this page helpful?

Last updated on 2026-06-24

Azure OpenAI image and audio REST API reference (2024-10-21)

API specs

Authentication

REST API versioning

Data plane inference

Transcriptions - Create

URI Parameters

Request Header

Request Body

Responses

Examples

Example

Example

Translations - Create

URI Parameters

Request Header

Request Body

Responses

Examples

Example

Example

Image generation

URI Parameters

Request Header

Request Body

Responses

Examples

Example

Components

innerErrorCode

dalleErrorResponse

dalleError

dalleInnerError

contentFilterSeverityResult

contentFilterDetectedResult

dalleFilterResults

audioResponse

audioVerboseResponse

audioResponseFormat

imageQuality

imagesResponseFormat

imageSize

imageStyle

generateImagesResponse

Next steps

Feedback

Additional resources