Skip to main content
POST
/
v3
/
minimax-voice-cloning
MiniMax Quick Voice Cloning
curl --request POST \
  --url https://api.myrouter.ai/v3/minimax-voice-cloning \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "audio_url": "<string>",
  "text": "<string>",
  "model": "<string>",
  "accuracy": 123,
  "need_noise_reduction": true,
  "need_volume_normalization": true
}
'
{
  "demo_audio_url": "<string>",
  "voice_id": "<string>"
}
This API supports single and dual-channel voice cloning, allowing you to quickly clone a voice with the same timbre based on a specified audio file. The cloned voice produced by this API is a temporary voice. If you wish to permanently retain a cloned voice, please use it in any T2A text-to-speech API within 168 hours (7 days) (excluding preview playback within this API); otherwise, the voice will be deleted. This API is suitable for scenarios such as: IP voice replication, voice cloning, and other scenarios that require quick voice cloning. Notes:
  • Uploaded audio files must be in mp3, m4a, or wav format;
  • Uploaded audio files must be at least 10 seconds and no longer than 5 minutes;
  • Uploaded audio files must not exceed 20 MB in size.

Request Headers

Content-Type
string
required
Enum: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

audio_url
string
required
URL of the audio file to clone the voice from. Supports mp3, m4a, and wav formats.
clone_prompt
Voice cloning parameters. Providing this parameter will help enhance timbre similarity and stability in speech synthesis.When using this parameter, you must also upload a short sample audio clip (duration less than 8 seconds) along with the corresponding text. Audio formats supported: mp3, m4a, wav.
text
string
Preview parameter. The model will use the cloned voice to read this text aloud and return the synthesized audio as a URL for previewing the cloning result. Limited to 2000 characters. Note: preview will incur normal speech synthesis charges based on character count, priced the same as each T2A API.
model
string
Preview parameter. Specifies the speech model used for the preview. This field is required when the “text” field is provided.
Possible values: speech-02-hd, speech-02-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview
accuracy
float
Voice cloning parameter. Range [0, 1]. Setting this field configures the text verification accuracy threshold. Default: 0.7.
need_noise_reduction
bool
Voice cloning parameter. Whether to enable noise reduction. Default: false.
need_volume_normalization
bool
Voice cloning parameter. Whether to enable volume normalization. Default: false.

Response

demo_audio_url
string
If the request body includes the preview text and preview model, this parameter returns the preview audio as a URL.
voice_id
string
The generated voice_id.