It is a high-quality multi-lingual text-to-speech library by MyShell.ai. It supports English, Spanish, French, Chinese, Japanese and Korean. | Convert text to high-quality AI voice in seconds. Perfect for content creators, businesses, educators and video makers. Fast, affordable and studio-grade output with multiple accents and languages. |
High-quality multi-lingual TTS;
Fast enough for CPU real-time inference;
The Chinese speaker supports mixed Chinese and English | Indian regional voices, multilingual TTS (Hindi–Marathi–Tamil–Telugu–Kannada + more), natural studio-quality speech, API for developers, voice cloning (on demand), commercial usage rights, fast cloud rendering, pay-as-you-go pricing, dashboard usage analytics, 24/7 customer support, cost-effective |
Statistics | |
GitHub Stars 6.9K | GitHub Stars - |
GitHub Forks 970 | GitHub Forks - |
Stacks 0 | Stacks 2 |
Followers 1 | Followers 3 |
Votes 0 | Votes 3 |
Integrations | |
| No integrations available | |

Transform boring PDFs and text into viral TikTok-style brainrot study videos. Free online tool with AI voices, speed control, and Minecraft backgrounds. 3 free videos daily!

Create viral AI-powered short videos, reels, TikToks, YouTube Shorts, and music videos with voiceovers, auto scripts, subtitles, and ai images — perfect for creators, educators, and marketers.

It is a cloud-based voice service and the brain behind tens of millions of devices including the Echo family of devices, FireTV, Fire Tablet, and third-party devices. You can build voice experiences, or skills, that make everyday tasks faster, easier, and more delightful for customers.

Amazon Polly is a service that turns text into lifelike speech. Polly lets you create applications that talk, enabling you to build entirely new categories of speech-enabled products. Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.

Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 30 voices, available in multiple languages and variants. It applies DeepMind’s groundbreaking research in WaveNet and Google’s powerful neural networks to deliver the highest fidelity possible.

It is a unified, developer-friendly API to the best available Speech-To-Text and Text-To-Speech services.

It is an on-device speech-to-text engine. By processing voice data locally on the device, it offers private, reliable, fully-customizable, and cost-effective audio transcription experiences. It achieves big tech-level accuracy at a fraction of their costs.

It is an open-source voice assistant. It is private by default and completely customizable. It can be freely remixed, extended, and deployed anywhere. It may be used in anything from a science project to a global enterprise environment.

It is more than just a fast and accurate audio to text converter. We go beyond audio transcription to help you get the most out of your content.

It is a library for advanced Text-to-Speech generation. It’s built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed, and quality. It comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects.