Voice Cloning API | MiniMax Quick Voice Cloning

MiniMax Quick Voice Cloning

curl --request POST \
  --url https://api.myrouter.ai/v3/minimax-voice-cloning \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "audio_url": "<string>",
  "text": "<string>",
  "model": "<string>",
  "accuracy": 123,
  "need_noise_reduction": true,
  "need_volume_normalization": true
}
'

{
  "demo_audio_url": "<string>",
  "voice_id": "<string>"
}

POST

minimax-voice-cloning

MiniMax Quick Voice Cloning

curl --request POST \
  --url https://api.myrouter.ai/v3/minimax-voice-cloning \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "audio_url": "<string>",
  "text": "<string>",
  "model": "<string>",
  "accuracy": 123,
  "need_noise_reduction": true,
  "need_volume_normalization": true
}
'

{
  "demo_audio_url": "<string>",
  "voice_id": "<string>"
}

This API supports single and dual-channel voice cloning, allowing you to quickly clone a voice with the same timbre based on a specified audio file. The cloned voice produced by this API is a temporary voice. If you wish to permanently retain a cloned voice, please use it in any T2A text-to-speech API within 168 hours (7 days) (excluding preview playback within this API); otherwise, the voice will be deleted. This API is suitable for scenarios such as: IP voice replication, voice cloning, and other scenarios that require quick voice cloning. Notes:

Uploaded audio files must be in mp3, m4a, or wav format;
Uploaded audio files must be at least 10 seconds and no longer than 5 minutes;
Uploaded audio files must not exceed 20 MB in size.

Request Headers

Content-Type

string

required

Enum: application/json

Authorization

string

required

Bearer authentication format: Bearer {{API Key}}.

Request Body

audio_url

string

required

URL of the audio file to clone the voice from. Supports mp3, m4a, and wav formats.

clone_prompt

Voice cloning parameters. Providing this parameter will help enhance timbre similarity and stability in speech synthesis.When using this parameter, you must also upload a short sample audio clip (duration less than 8 seconds) along with the corresponding text. Audio formats supported: mp3, m4a, wav.

Show properties

prompt_audio_url

number

required

Audio prompt parameter: URL of the sample audio, duration must be less than 8 seconds.

prompt_text

string

required

Audio prompt parameter: the text corresponding to the sample audio. Must match the audio content exactly, and must end with punctuation.

text

string

Preview parameter. The model will use the cloned voice to read this text aloud and return the synthesized audio as a URL for previewing the cloning result. Limited to 2000 characters. Note: preview will incur normal speech synthesis charges based on character count, priced the same as each T2A API.

model

string

Preview parameter. Specifies the speech model used for the preview. This field is required when the “text” field is provided.
Possible values: speech-02-hd, speech-02-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview

accuracy

float

Voice cloning parameter. Range [0, 1]. Setting this field configures the text verification accuracy threshold. Default: 0.7.

need_noise_reduction

bool

Voice cloning parameter. Whether to enable noise reduction. Default: false.

need_volume_normalization

bool

Voice cloning parameter. Whether to enable volume normalization. Default: false.

Response

demo_audio_url

string

If the request body includes the preview text and preview model, this parameter returns the preview audio as a URL.

voice_id

string

The generated voice_id.

MiniMax Speech 2.8 HD Sync Text-to-Speech

Elevenlabs scribe v1 Speech to Text

​Request Headers

​Request Body

​Response

Request Headers

Request Body

Response