Skip to main content
POST
/
v3
/
minimax-speech-2.5-hd-preview
MiniMax Speech-2.5-hd-preview Synchronous Text-to-Speech
curl --request POST \
  --url https://api.myrouter.ai/v3/minimax-speech-2.5-hd-preview \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "voice_setting": {
    "speed": 123,
    "vol": 123,
    "pitch": 123,
    "voice_id": "<string>",
    "emotion": "<string>",
    "latex_read": true,
    "text_normalization": true
  },
  "audio_setting": {
    "sample_rate": 123,
    "bitrate": 123,
    "format": "<string>",
    "channel": 123
  },
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  },
  "timbre_weights": [
    {
      "voice_id": "<string>",
      "weight": 123
    }
  ],
  "stream": true,
  "stream_options": {
    "exclude_aggregated_audio": true
  },
  "language_boost": "<string>",
  "output_format": "<string>",
  "voice_modify": {
    "pitch": 123,
    "intensity": 123,
    "timbre": 123,
    "sound_effects": "<string>"
  }
}
'
{
  "audio": "<string>",
  "status": 123
}
This API supports synchronous text-to-speech generation with a maximum of 10,000 characters per request. Supports 100+ system voices and cloned voices; supports volume, pitch, speed, and output format customization; supports proportional voice mixing and fixed interval time control; supports multiple audio specifications and formats including: mp3, pcm, flac, wav, with streaming output support. After submitting a long text speech synthesis request, please note that the returned URL is valid for 24 hours from the time it is generated. Please download the content in time.
Suitable for short sentence generation, voice chat, online social scenarios with low latency, but text length limit is less than 10,000 characters. For long text, it is recommended to use Async Text-to-Speech.

Request Headers

Content-Type
string
required
Enum: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

text
string
required
The text to be synthesized, length limit less than 10,000 characters. Use newline characters for paragraph breaks. (To control pause duration in speech, add <#x#> between characters, where x is in seconds, supporting 0.01-99.99 with up to two decimal places). Supports custom time intervals between text segments to achieve custom speech pause durations. Note that text intervals must be set between two text segments that can be vocalized, and multiple consecutive time intervals cannot be set.
voice_setting
object
required
audio_setting
object
pronunciation_dict
object
timbre_weights
object[]
Either this or voice_id is required.
stream
boolean
default:"false"
Whether to enable streaming. Default: false (streaming disabled).
stream_options
object
language_boost
string
default:"null"
Enhances recognition of specified minority languages and dialects. When set, it can improve speech performance for the specified language/dialect. If the language type is unclear, you can select “auto” and the model will automatically determine the language type. Supported values:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'
output_format
string
default:"hex"
Parameter that controls the output format. Possible values: url, hex. Default: hex. This parameter only takes effect in non-streaming scenarios; streaming scenarios only support hex output. The returned URL is valid for 24 hours.
voice_modify
object
Voice effect settings. Supported audio formats for this parameter:
  • Non-streaming: mp3, wav, flac
  • Streaming: mp3

Response

audio
string
The synthesized audio segment, hex-encoded, generated in the format defined by the input (audio_setting.format) (mp3/pcm/flac). The return format is determined by output_format; when stream is true, only hex format is supported.
status
number
Current audio stream status, only returned when stream is true. 1 indicates synthesis in progress, 2 indicates synthesis complete.