TTA Speech 2.6 Turbo ASYNC API | MiniMax High-Quality Text-to-Speech

MiniMax Speech-2.6-turbo Async Text-to-Speech

curl --request POST \
  --url https://api.myrouter.ai/v3/async/minimax-speech-2.6-turbo \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "voice_setting": {
    "speed": 123,
    "vol": 123,
    "pitch": 123,
    "voice_id": "<string>",
    "emotion": "<string>",
    "text_normalization": true
  },
  "audio_setting": {
    "sample_rate": 123,
    "bitrate": 123,
    "format": "<string>",
    "channel": 123
  },
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  },
  "language_boost": "<string>",
  "voice_modify": {
    "pitch": 123,
    "intensity": 123,
    "timbre": 123,
    "sound_effects": "<string>"
  }
}
'

{
  "task_id": "<string>"
}

POST

async

minimax-speech-2.6-turbo

MiniMax Speech-2.6-turbo Async Text-to-Speech

curl --request POST \
  --url https://api.myrouter.ai/v3/async/minimax-speech-2.6-turbo \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "voice_setting": {
    "speed": 123,
    "vol": 123,
    "pitch": 123,
    "voice_id": "<string>",
    "emotion": "<string>",
    "text_normalization": true
  },
  "audio_setting": {
    "sample_rate": 123,
    "bitrate": 123,
    "format": "<string>",
    "channel": 123
  },
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  },
  "language_boost": "<string>",
  "voice_modify": {
    "pitch": 123,
    "intensity": 123,
    "timbre": 123,
    "sound_effects": "<string>"
  }
}
'

{
  "task_id": "<string>"
}

This API supports async text-to-speech generation, with a maximum of 1 million characters per request. The complete audio result can be retrieved asynchronously. Supports 100+ system voices and cloned voices; supports pitch, speed, volume, bitrate, sample rate, and output format adjustments. After submitting a long text speech synthesis request, please note that the returned URL is valid for 24 hours from the time it is returned. Please be mindful of the download timing.

Suitable for long text speech generation such as entire books; task queuing may take longer. For short sentence generation, voice chat, online social scenarios, it is recommended to use Synchronous Text-to-Speech.

Request Headers

Content-Type

string

required

Enum: application/json

Authorization

string

required

Bearer authentication format: Bearer {{API Key}}.

Request Body

text

string

required

The text to be synthesized, maximum length 50,000 characters.

voice_setting

object

required

Show properties

speed

number

Range [0.5, 2], Default: 1.0Speech speed of the generated voice. Optional; higher values produce faster speech.

vol

number

Range (0, 10], Default: 1.0Volume of the generated voice. Optional; higher values produce louder audio.

pitch

number

default:0

Range [-12, 12], Default: 0Pitch of the generated voice. Optional (0 outputs the original voice pitch; value must be an integer).

voice_id

string

The voice ID for the request.Supports system voices (ID) and cloned voices (ID). The available system voice IDs are as follows:

Youthful Young Man: male-qn-qingse
Elite Young Man: male-qn-jingying
Assertive Young Man: male-qn-badao
College Student: male-qn-daxuesheng
Young Girl: female-shaonv
Mature Lady: female-yujie
Mature Woman: female-chengshu
Sweet Woman: female-tianmei
Male Presenter: presenter_male
Female Presenter: presenter_female
Male Audiobook 1: audiobook_male_1
Male Audiobook 2: audiobook_male_2
Female Audiobook 1: audiobook_female_1
Female Audiobook 2: audiobook_female_2
Youthful Young Man (beta): male-qn-qingse-jingpin
Elite Young Man (beta): male-qn-jingying-jingpin
Assertive Young Man (beta): male-qn-badao-jingpin
College Student (beta): male-qn-daxuesheng-jingpin
Young Girl (beta): female-shaonv-jingpin
Mature Lady (beta): female-yujie-jingpin
Mature Woman (beta): female-chengshu-jingpin
Sweet Woman (beta): female-tianmei-jingpin
Clever Boy: clever_boy
Cute Boy: cute_boy
Lovely Girl: lovely_girl
Cartoon Pig: cartoon_pig
Clingy Brother: bingjiao_didi
Handsome Boyfriend: junlang_nanyou
Innocent Junior: chunzhen_xuedi
Cool Senior: lengdan_xiongzhang
Bossy Young Master: badao_shaoye
Sweetheart Xiaoling: tianxin_xiaoling
Playful Girl: qiaopi_mengmei
Charming Lady: wumei_yujie
Cute Junior Girl: diadia_xuemei
Elegant Senior Girl: danya_xuejie
Santa Claus: Santa_Claus
Grinch: Grinch
Rudolph: Rudolph
Arnold: Arnold
Charming Santa: Charming_Santa
Charming Lady: Charming_Lady
Sweet Girl: Sweet_Girl
Cute Elf: Cute_Elf
Attractive Girl: Attractive_Girl
Serene Woman: Serene_Woman

emotion

string

Controls the emotion of the synthesized speech.Currently supports 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral.Possible values: ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]

text_normalization

bool

default:"false"

This parameter enables English text normalization, which can improve performance in number-reading scenarios but may slightly increase latency. Default: false.

audio_setting

object

Show properties

sample_rate

number

default:32000

Possible values: [8000, 16000, 22050, 24000, 32000, 44100]Sample rate of the generated audio. Optional, Default: 32000.

bitrate

number

default:128000

Possible values: [32000, 64000, 128000, 256000]Bitrate of the generated audio. Optional, Default: 128000. This parameter only applies to mp3 format audio.

format

string

default:"mp3"

The generated audio format. Default: mp3. Options: mp3, pcm, flac, wav. wav is only supported in non-streaming output.

channel

number

default:1

Number of audio channels. Default: 1 (mono). Options:1: Mono2: Stereo

pronunciation_dict

object

Show properties

tone

list

Replace text, symbols, and their corresponding pronunciations that require special annotation.Pronunciation replacement (adjust tones / replace with other character pronunciations), format as follows:["omg/oh my god"]For Chinese text, tones are represented by numbers: 1st tone (high level) is 1, 2nd tone (rising) is 2, 3rd tone (dipping) is 3, 4th tone (falling) is 4, neutral tone is 5.

language_boost

string

default:"null"

Enhances recognition capability for specified minority languages and dialects. When set, it can improve speech performance for the specified language/dialect. If the language type is unclear, you can select “auto” and the model will automatically determine the language type. Supported values:

'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'

voice_modify

object

Voice effect settings. Supported audio formats: mp3, wav, flac

Show properties

pitch

integer

Pitch adjustment (deep/bright), Range [-100, 100]. Values closer to -100 produce a deeper voice; values closer to 100 produce a brighter voice.

intensity

integer

Intensity adjustment (powerful/soft), Range [-100, 100]. Values closer to -100 produce a more forceful voice; values closer to 100 produce a softer voice.

timbre

integer

Timbre adjustment (resonant/crisp), Range [-100, 100]. Values closer to -100 produce a richer voice; values closer to 100 produce a crisper voice.

sound_effects

string

Sound effect settings. Only one can be selected at a time. Possible values:

spacious_echo (spacious echo)
auditorium_echo (auditorium broadcast)
lofi_telephone (telephone distortion)
robotic (electronic voice)

Response

task_id

string

required

The task_id of the async task. Use the task_id to call the Get Async Task Result API to retrieve the generated result.

MiniMax Speech-2.6-turbo Synchronous Text-to-Speech

MiniMax Speech 2.8 Turbo Async Text-to-Speech

​Request Headers

​Request Body

​Response

Request Headers

Request Body

Response