Fish Audio Text-to-Speech
Audio
Fish Audio Text-to-Speech
POST
Fish Audio Text-to-Speech
For best results, it is recommended to upload reference audio using the Voice Cloning API before using this API. This will improve voice quality and reduce latency.
-
WAV / PCM
- Sample rates: 8kHz, 16kHz, 24kHz, 32kHz, 44.1kHz
- Default sample rate: 44.1kHz
- 16-bit, mono
-
MP3
- Sample rates: 32kHz, 44.1kHz
- Default sample rate: 44.1kHz
- Mono
- Bitrates: 64kbps, 128kbps (default), 192kbps
-
Opus
- Sample rate: 48kHz
- Default sample rate: 48kHz
- Mono
- Bitrates: -1000 (auto), 24kbps, 32kbps (default), 48kbps, 64kbps
Request Headers
Enum:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
The text to be converted to speech.
Controls the randomness of speech generation. Higher values (e.g., 1.0) make the output more random, lower values (e.g., 0.1) make it more deterministic. We recommend
0.9 for the s1 model.Required range: 0 <= x <= 1Controls diversity through nucleus sampling. Lower values (e.g., 0.1) make the output more focused, higher values (e.g., 1.0) allow more diversity. We recommend
0.9 for the s1 model.Required range: 0 <= x <= 1Reference audio for the voice. This requires MessagePack serialization, which will override reference_voices and reference_texts.
Reference model ID for the voice.
Prosody control for the voice.
Chunk length for the voice.Required range:
100 <= x <= 300Whether to normalize the voice. This will reduce latency but may decrease performance on numbers and dates.
Format for the voice.Possible values:
wav, pcm, mp3, opusSample rate for the voice.
MP3 bitrate for the voice.Possible values:
64, 128, 192Opus bitrate for the voice.Possible values:
-1000, 24, 32, 48, 64Latency setting for the voice. balanced will reduce latency but may result in decreased performance.Possible values:
normal, balancedResponse
The API will return an audio stream in the format specified by theformat parameter (Default: mp3).