This API supports asynchronous text-to-speech generation with a maximum of 1 million characters per request. The complete generated audio result can be retrieved asynchronously. Supports 100+ system voices and cloned voices; supports customization of pitch, speed, volume, bitrate, sample rate, and output format.After submitting a long text speech synthesis request, please note that the returned URL is valid for 24 hours from the time it is generated. Please download the content in time.
Suitable for long text speech generation such as entire books. Task queuing may take a long time. For short sentence generation, voice chat, and online social scenarios, it is recommended to use Synchronous Text-to-Speech.
This parameter enables English text normalization, which can improve performance in number reading scenarios but slightly increases latency. If not provided, defaults to false.
Possible values: [32000, 64000, 128000, 256000]The bitrate of the generated audio. Optional, Default: 128000. This parameter only applies to mp3 format audio.
Replace characters, symbols, and their corresponding pronunciations that require special annotation.Replace pronunciation (adjust tone/replace with other character pronunciation), format as follows:["omg/oh my god"]Tones are represented by numbers: 1st tone (high level) is 1, 2nd tone (rising) is 2, 3rd tone (dipping) is 3, 4th tone (falling) is 4, neutral tone is 5.
Enhances recognition of specified minority languages and dialects. When set, it can improve speech performance for the specified language/dialect. If the language type is unclear, you can select βautoβ and the model will automatically determine the language type. Supported values:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'
Intensity adjustment (powerful/soft), range [-100, 100]. Values closer to -100 produce a more powerful sound; values closer to 100 produce a softer sound.