Skip to main content
POST
/
v3
/
gemini-2.5-flash-tts
Gemini 2.5 Flash TTS Text-to-Speech
curl --request POST \
  --url https://api.myrouter.ai/v3/gemini-2.5-flash-tts \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "contents": {
    "role": "<string>",
    "parts": {
      "text": "<string>"
    }
  },
  "generation_config": {
    "temperature": 123,
    "speech_config": {
      "voice_config": {
        "prebuilt_voice_config": {
          "voice_name": "<string>"
        }
      },
      "language_code": "<string>",
      "multi_speaker_voice_config": {
        "speaker_voice_configs": [
          {
            "speaker": "<string>",
            "voice_config": {
              "prebuilt_voice_config": {
                "voice_name": "<string>"
              }
            }
          }
        ]
      }
    }
  }
}
'
{
  "audioContent": "<string>",
  "usageMetadata": {
    "totalTokenCount": 123,
    "promptTokenCount": 123,
    "candidatesTokenCount": 123
  }
}
Converts text to speech based on the Vertex AI generateContent API. The request body format is fully consistent with the official Vertex AI API. Supports both synchronous (single request, single response) and streaming (single request, streamed response) modes. Output is in LINEAR16 PCM format (24kHz, mono, 16-bit signed little-endian) without a WAV header.

Request Headers

Content-Type
string
required
Enum: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

contents
object
required
generation_config
object
required

Response

audioContent
string
Base64 encoded audio content. Format is LINEAR16 PCM (24kHz, mono, 16-bit signed little-endian) without a WAV header. Clients can convert using ffmpeg: ffmpeg -f s16le -ar 24k -ac 1 -i input.raw output.wav
usageMetadata
object