Gemini 2.5 Flash TTS Text-to-Speech
Audio
Gemini 2.5 Flash TTS Text-to-Speech
POST
Gemini 2.5 Flash TTS Text-to-Speech
Converts text to speech based on the Vertex AI generateContent API. The request body format is fully consistent with the official Vertex AI API. Supports both synchronous (single request, single response) and streaming (single request, streamed response) modes. Output is in LINEAR16 PCM format (24kHz, mono, 16-bit signed little-endian) without a WAV header.
Request Headers
Enum:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
Response
Base64 encoded audio content. Format is LINEAR16 PCM (24kHz, mono, 16-bit signed little-endian) without a WAV header. Clients can convert using ffmpeg: ffmpeg -f s16le -ar 24k -ac 1 -i input.raw output.wav