This API supports async text-to-speech generation, with a maximum of 1 million characters per request. The complete audio result can be retrieved asynchronously. Supports 100+ system voices and cloned voices; supports pitch, speed, volume, bitrate, sample rate, and output format adjustments.After submitting a long text speech synthesis request, please note that the returned URL is valid for 24 hours from the time it is returned. Please be mindful of the download timing.
Suitable for long text speech generation such as entire books; task queuing may take longer. For short sentence generation, voice chat, online social scenarios, it is recommended to use Synchronous Text-to-Speech.
This parameter enables English text normalization, which can improve performance in number-reading scenarios but may slightly increase latency. Default: false.
Possible values: [32000, 64000, 128000, 256000]Bitrate of the generated audio. Optional, Default: 128000. This parameter only applies to mp3 format audio.
Replace text, symbols, and their corresponding pronunciations that require special annotation.Pronunciation replacement (adjust tones / replace with other character pronunciations), format as follows:["omg/oh my god"]For Chinese text, tones are represented by numbers: 1st tone (high level) is 1, 2nd tone (rising) is 2, 3rd tone (dipping) is 3, 4th tone (falling) is 4, neutral tone is 5.
Enhances recognition capability for specified minority languages and dialects. When set, it can improve speech performance for the specified language/dialect. If the language type is unclear, you can select βautoβ and the model will automatically determine the language type. Supported values:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'
Intensity adjustment (powerful/soft), Range [-100, 100]. Values closer to -100 produce a more forceful voice; values closer to 100 produce a softer voice.