MiniMax Speech-2.6-turbo Synchronous Text-to-Speech
Audio
MiniMax Speech-2.6-turbo Synchronous Text-to-Speech
POST
MiniMax Speech-2.6-turbo Synchronous Text-to-Speech
This API supports synchronous text-to-speech generation, with a maximum of 10,000 characters per request. Supports 100+ system voices and cloned voices; supports volume, pitch, speed, and output format adjustments; supports proportional voice mixing and fixed interval time control; supports multiple audio specifications and formats including: mp3, pcm, flac, wav, and streaming output.
After submitting a long text speech synthesis request, please note that the returned URL is valid for 24 hours from the time it is returned. Please be mindful of the download timing.
Request Headers
Enum:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
The text to be synthesized, length limit less than 10,000 characters. Use newline characters for paragraph breaks. (To control the pause duration in the speech, insert <#x#> between characters, where x is in seconds, supporting 0.01-99.99 with up to two decimal places). Supports custom time intervals between text segments to achieve custom speech pause durations. Note that the interval must be set between two text segments that can be vocalized, and multiple consecutive intervals cannot be set.
Either this or voice_id is required.
Whether to enable streaming. Default: false (streaming disabled).
Enhances recognition capability for specified minority languages and dialects. When set, it can improve speech performance for the specified language/dialect. If the language type is unclear, you can select “auto” and the model will automatically determine the language type. Supported values:
'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'Controls the output format. Possible values:
url, hex. Default: hex. This parameter only takes effect in non-streaming scenarios; streaming only supports hex format. The returned URL is valid for 24 hours.Voice effect settings. Supported audio formats for this parameter:
- Non-streaming: mp3, wav, flac
- Streaming: mp3
Response
The synthesized audio segment, hex-encoded, generated in the format defined by the input (
audio_setting.format) (mp3/pcm/flac). The return format is determined by the output_format setting. When stream is true, only hex format is supported.Current audio stream status, returned only when
stream is true. 1 indicates synthesis in progress, 2 indicates synthesis complete.