Skip to main content
POST
/
v3
/
async
/
minimax-speech-2.8-turbo
MiniMax Speech 2.8 Turbo Async Text-to-Speech
curl --request POST \
  --url https://api.myrouter.ai/v3/async/minimax-speech-2.8-turbo \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "text_file_id": 123,
  "voice_modify": {
    "pitch": 123,
    "timbre": 123,
    "intensity": 123,
    "sound_effects": "<string>"
  },
  "audio_setting": {
    "format": "<string>",
    "bitrate": 123,
    "channel": 123,
    "audio_sample_rate": 123
  },
  "voice_setting": {
    "vol": 123,
    "pitch": 123,
    "speed": 123,
    "emotion": "<string>",
    "voice_id": "<string>",
    "english_normalization": true
  },
  "aigc_watermark": true,
  "language_boost": "<string>",
  "continuous_sound": true,
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  }
}
'
{
  "file_id": 123,
  "task_id": "<string>",
  "base_resp": {
    "status_msg": "<string>",
    "status_code": 123
  },
  "task_token": "<string>",
  "usage_characters": 123
}
Use this API to create an async text-to-speech task. Supports text or file input, with a text length limit of 50,000 characters and a file limit of 100,000 characters.
This is an async API that only returns the task_id of the async task. Use the task_id to call the Get Async Task Result API to retrieve the generated result.

Request Headers

Content-Type
string
required
Enum: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

text
string
The text to be synthesized into audio, limited to a maximum of 50,000 characters. Either this or text_file_id is required.

Interjection tags: Only when the model is speech-2.8-hd or speech-2.8-turbo, interjection tags can be inserted into the text. Supported interjections: (laughs) (laughter), (chuckle) (chuckle), (coughs) (cough), (clear-throat) (throat clearing), (groans) (groan), (breath) (normal breathing), (pant) (panting), (inhale) (inhale), (exhale) (exhale), (gasps) (gasp), (sniffs) (sniff), (sighs) (sigh), (snorts) (snort), (burps) (burp), (lip-smacking) (lip smacking), (humming) (humming), (hissing) (hissing), (emm) (umm), (whistles) (whistle), (sneezes) (sneeze), (crying) (sobbing), (applause) (applause)
text_file_id
integer
The text file ID for audio synthesis. Single file length limit is less than 100,000 characters. Supported file formats: txt, zip. Either this or text is required; format is automatically validated upon submission.
txt file: Length limit <100,000 characters. Supports custom pauses using &lt;#x#&gt; markers. x is the pause duration (in seconds), range [0.01, 99.99], up to two decimal places. Note that pauses must be placed between two vocalizable text segments; consecutive pause markers cannot be used.
zip file:
The archive must contain txt or json files of the same format.
json file format: Supports [title, content, extra] three fields, representing title, body, and additional information respectively. If all three fields exist, 3 sets of results will be produced, totaling 9 files stored in a single folder. If a field does not exist or is empty, no corresponding result will be generated for that field.
voice_modify
object
audio_setting
object
voice_setting
object
required
aigc_watermark
boolean
default:false
Controls whether to add an audio rhythm identifier at the end of the synthesized audio. Default: False. This parameter only applies to non-streaming synthesis.
language_boost
string
Whether to enhance recognition of specified minority languages and dialects. Default: null. Can be set to auto to let the model automatically determine the language type.Possible values: Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, auto
continuous_sound
boolean
default:false
Enable this parameter to make clause transitions more natural. Only supported for speech-2.8-hd and speech-2.8-turbo models.
pronunciation_dict
object

Response

file_id
integer
The ID of the corresponding audio file returned after successful task creation.

After the task is completed, you can query using the file_id. This field is not returned when the request fails.
Note: The returned download URL is valid for 9 hours (32,400 seconds) from generation. After expiration, the file will become invalid and the generated content will be lost. Please download in time.
task_id
string
Use the task_id to call the Get Async Task Result API to retrieve the generated output.
base_resp
object
task_token
string
The key information used to complete the current task
usage_characters
integer
Billed character count