MiniMax Speech 2.8 HD Async Text-to-Speech
Audio
MiniMax Speech 2.8 HD Async Text-to-Speech
POST
MiniMax Speech 2.8 HD Async Text-to-Speech
Use this API to create an async text-to-speech task. Supports text or file input, with a text length limit of 50,000 characters and a file limit of 100,000 characters.
Request Headers
Enum:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
The text to be synthesized into audio, limited to a maximum of 50,000 characters. Either this or
Interjection tags: Only when the model is
text_file_id is required.Interjection tags: Only when the model is
speech-2.8-hd or speech-2.8-turbo, interjection tags can be inserted into the text. Supported interjections: (laughs) (laughter), (chuckle) (chuckle), (coughs) (cough), (clear-throat) (throat clearing), (groans) (groan), (breath) (normal breathing), (pant) (panting), (inhale) (inhale), (exhale) (exhale), (gasps) (gasp), (sniffs) (sniff), (sighs) (sigh), (snorts) (snort), (burps) (burp), (lip-smacking) (lip smacking), (humming) (humming), (hissing) (hissing), (emm) (umm), (whistles) (whistle), (sneezes) (sneeze), (crying) (sobbing), (applause) (applause)The text file ID for audio synthesis. Single file length limit is less than 100,000 characters. Supported file formats: txt, zip. Either this or
txt file: Length limit <100,000 characters. Supports custom pauses using
zip file:
The archive must contain txt or json files of the same format.
json file format: Supports [
text is required; format is automatically validated upon submission.txt file: Length limit <100,000 characters. Supports custom pauses using
<#x#> markers. x is the pause duration (in seconds), range [0.01, 99.99], up to two decimal places. Note that pauses must be placed between two vocalizable text segments; consecutive pause markers cannot be used.zip file:
The archive must contain txt or json files of the same format.
json file format: Supports [
title, content, extra] three fields, representing title, body, and additional information respectively. If all three fields exist, 3 sets of results will be produced, totaling 9 files stored in a single folder. If a field does not exist or is empty, no corresponding result will be generated for that field.Controls whether to add an audio rhythm identifier at the end of the synthesized audio. Default: False. This parameter only applies to non-streaming synthesis.
Whether to enhance recognition of specified minority languages and dialects. Default:
null. Can be set to auto to let the model automatically determine the language type.Possible values: Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, autoEnable this parameter to make clause transitions more natural. Only supported for
speech-2.8-hd and speech-2.8-turbo models.Response
The ID of the corresponding audio file returned after successful task creation.
After the task is completed, you can query using the file_id. This field is not returned when the request fails.Note: The returned download URL is valid for 9 hours (32,400 seconds) from generation. After expiration, the file will become invalid and the generated content will be lost. Please download in time.
After the task is completed, you can query using the file_id. This field is not returned when the request fails.Note: The returned download URL is valid for 9 hours (32,400 seconds) from generation. After expiration, the file will become invalid and the generated content will be lost. Please download in time.
Use the task_id to call the Get Async Task Result API to retrieve the generated output.
The key information used to complete the current task
Billed character count