It is more than just a fast and accurate audio to text converter. We go beyond audio transcription to help you get the most out of your content. | It is a unified, developer-friendly API to the best available Speech-To-Text and Text-To-Speech services. |
Speech-to-text; Makes audio and video searchable, editable and shareable | Build voice-enabled chatbot services (for example, IVR systems); Classification of audio file transcriptions; Automated Testing of Voice services with Botium |
Statistics | |
GitHub Stars - | GitHub Stars 943 |
GitHub Forks - | GitHub Forks 58 |
Stacks 2 | Stacks 7 |
Followers 1 | Followers 21 |
Votes 0 | Votes 0 |
Integrations | |

It can be used to complement any regular touch user interface with a real time voice user interface. It offers real time feedback for faster and more intuitive experience that enables end user to recover from possible errors quickly and with no interruptions.

Amazon Polly is a service that turns text into lifelike speech. Polly lets you create applications that talk, enabling you to build entirely new categories of speech-enabled products. Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.

Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 30 voices, available in multiple languages and variants. It applies DeepMind’s groundbreaking research in WaveNet and Google’s powerful neural networks to deliver the highest fidelity possible.

It is a state-of-the-art automatic speech recognition toolkit. It is intended for use by speech recognition researchers and professionals.

We made AudioKit open-source because we believe that clear, powerful audio development is best developed and maintained through a large, active base of developers and users. Our core code, tests, examples, and website are all available for contributions.

It is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

It is an on-device speech-to-text engine. By processing voice data locally on the device, it offers private, reliable, fully-customizable, and cost-effective audio transcription experiences. It achieves big tech-level accuracy at a fraction of their costs.

wav2letter++ is a fast open source speech processing toolkit from the Speech Team at Facebook AI Research. It is written entirely in C++ and uses the ArrayFire tensor library and the flashlight machine learning library for maximum efficiency. Our approach is detailed in this arXiv paper.

It is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Produce high quality recordings without having to shell out thousands of dollars for equipment. The only thing you need is your guitar, your computer, and a digital audio workstation.