This API supports single and dual-channel voice cloning, allowing you to quickly clone a voice with the same timbre based on a specified audio file.
The cloned voice produced by this API is a temporary voice. If you wish to permanently retain a cloned voice, please use it in any T2A text-to-speech API within 168 hours (7 days) (excluding preview playback within this API); otherwise, the voice will be deleted.This API is suitable for scenarios such as: IP voice replication, voice cloning, and other scenarios that require quick voice cloning.Notes:
Uploaded audio files must be in mp3, m4a, or wav format;
Uploaded audio files must be at least 10 seconds and no longer than 5 minutes;
Uploaded audio files must not exceed 20 MB in size.
Voice cloning parameters. Providing this parameter will help enhance timbre similarity and stability in speech synthesis.When using this parameter, you must also upload a short sample audio clip (duration less than 8 seconds) along with the corresponding text. Audio formats supported: mp3, m4a, wav.
Preview parameter. The model will use the cloned voice to read this text aloud and return the synthesized audio as a URL for previewing the cloning result. Limited to 2000 characters. Note: preview will incur normal speech synthesis charges based on character count, priced the same as each T2A API.
Preview parameter. Specifies the speech model used for the preview. This field is required when the “text” field is provided.
Possible values: speech-02-hd, speech-02-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview