Fish Audio API for creating voice models (voice cloning).
Bearer authentication format: Bearer {{API Key}}.
Request Body
Model type. tts stands for text-to-speech. Possible values: tts Allowed values: "tts"
Model training mode. For TTS models, fast means the model is available immediately after creation. Possible values: fast Allowed values: "fast"
Upload voice files for model fine-tuning.
visibility
enum<string>
default: "public"
Model visibility. public will display on the discovery page, unlist allows anyone with the link to access, private is visible only to the creator. Possible values: public, unlist, private
Model cover image. Required if the model is public.
Text corresponding to the voices. If not specified, ASR (Automatic Speech Recognition) will be performed on the voices.
Response
Unique identifier of the created model.
Model type. Possible values: svc, tts
URL of the model cover image.
Current state of the model. Possible values: created, training, trained, failed
created_at
string<date-time>
required
Timestamp when the model was created.
updated_at
string<date-time>
required
Timestamp when the model was last updated.
Model visibility setting. Possible values: public, unlist, private
Number of likes the model has received.
Number of bookmarks the model has received.
Number of times the model has been shared.
Number of tasks associated with the model.
author
AuthorEntity · object
required
Information about the model author. Unique identifier of the author.
URL of the author’s avatar image.
train_mode
enum<string>
default: "full"
Training mode used by the model. Possible values: fast, full
Sample data associated with the model. Text content of the sample.
Task identifier of the sample.
URL of the sample audio file.
Languages supported by the model.
Whether the visibility setting is locked.
Whether the current user has unliked this model.
Whether the current user has liked this model.
Whether the current user has bookmarked this model.