Model Details

Domain:

Speech

Task:

Speech synthesis

Text-to-speech TTS

Model Access:

Open weights (non-commercial)

Introduction

Fish Speech V1.5 is a leading text-to-speech (TTS) model trained on more than 1 million hours of audio data in multiple languages.

Benchmarking

FLOPs

2.66e+21

Notes: Previous model Fish-Speech 1.4 used 1.9151355e+21 FLOP for training (see model card) and was trained on 720000 hours of audio, while this model is similar but trained on 1M hours of audio -> its training compute couldbe estimated as 1.9151355e+21 * 10^6 / 720000 = 2.6599104e+21 FLOP

Training

Training Code Accessibility

weights are released under CC-BY-NC-SA-4.0 License https://huggingface.co/fishaudio/fish-speech-1.5 This codebase is released under Apache License https://github.com/fishaudio/fish-speech/tree/main/fish_speech

Size Notes: [tokens] Their previous model was trained on 720000 hours of audio data that was equal to 5*10^11 tokens -> 1M hours ~ 7*10^11 tokens model trained on more than 1 million hours of audio data in multiple languages.