Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.
It is a transformer-based text-to-audio model. It can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. | It is Stability AI’s first product for music and sound effect generation. Users can create original audio by entering a text prompt and a duration, generating audio in high-quality, 44.1 kHz stereo. |
Supports various languages out-of-the-box;
Can generate all types of audio | Create original audio by entering a text prompt and a duration;
High-quality audio generation;
Uses a latent diffusion for audio model |
Statistics | |
GitHub Stars 38.7K | GitHub Stars - |
GitHub Forks 4.7K | GitHub Forks - |
Stacks 0 | Stacks 1 |
Followers 1 | Followers 3 |
Votes 0 | Votes 0 |
Integrations | |
| No integrations available | |

It is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

It is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It empowers developers and businesses to better connect with their audiences at scale.