Kaldi vs SpeechPy

Overview

SpeechPy

Stacks1

Followers11

Votes0

GitHub Stars884

Forks105

Kaldi

Stacks24

Followers25

Votes0

GitHub Stars15.2K

Forks5.4K

Kaldi vs SpeechPy: What are the differences?

What is Kaldi? Toolkit for speech recognition. It is a state-of-the-art automatic speech recognition toolkit. It is intended for use by speech recognition researchers and professionals.

What is SpeechPy? 💬A Library for Speech Processing and Recognition. The purpose of this project is to provide a package for speech processing and feature extraction. This library provides most frequent used speech features including MFCCs and filterbank energies alongside with the log-energy of filterbanks.

Kaldi and SpeechPy can be primarily classified as "Speech Recognition" tools.

Kaldi and SpeechPy are both open source tools. Kaldi with 9.38K GitHub stars and 4.17K forks on GitHub appears to be more popular than SpeechPy with 819 GitHub stars and 109 GitHub forks.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Detailed Comparison

SpeechPy	Kaldi
The purpose of this project is to provide a package for speech processing and feature extraction. This library provides most frequent used speech features including MFCCs and filterbank energies alongside with the log-energy of filterbanks.	It is a state-of-the-art automatic speech recognition toolkit. It is intended for use by speech recognition researchers and professionals.
Mel Frequency Cepstral Coefficients(MFCCs);Filterbank Energies;Log Filterbank Energies	-
Statistics
GitHub Stars 884	GitHub Stars 15.2K
GitHub Forks 105	GitHub Forks 5.4K
Stacks 1	Stacks 24
Followers 11	Followers 25
Votes 0	Votes 0
Integrations
Python	No integrations available

What are some alternatives to SpeechPy, Kaldi?

Speechly

It can be used to complement any regular touch user interface with a real time voice user interface. It offers real time feedback for faster and more intuitive experience that enables end user to recover from possible errors quickly and with no interruptions.

MumbleFlow

MumbleFlow is a fully local speech to text and voice to text app. Sub-second offline transcription powered by whisper.cpp. No cloud, no subscription — $5 one-time purchase. Available on macOS, Windows & Linux.

Voibe

Voibe is an offline voice dictation app for macOS that lets you write at the speed of thought. It works everywhere (Mail, Notes, Browsers, Slack, VS Code, ChatGPT, etc.), making it easy to draft messages, capture ideas, and produce long content without breaking concentration.

Deepspeech

It is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Botium Speech Processing

It is a unified, developer-friendly API to the best available Speech-To-Text and Text-To-Speech services.

wav2letter++

wav2letter++ is a fast open source speech processing toolkit from the Speech Team at Facebook AI Research. It is written entirely in C++ and uses the ArrayFire tensor library and the flashlight machine learning library for maximum efficiency. Our approach is detailed in this arXiv paper.

LibreASR

It is an On-Premises, Streaming Speech Recognition System built with PyTorch and fastai.

WhisperFusion

It builds upon the capabilities of the WhisperLive and WhisperSpeech by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. Both LLM and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities.

Writeout.ai

Transcribe and translate audio files using OpenAI's Whisper API. You can upload any audio file, and the application will send it through the OpenAI Whisper API using Laravel's queued jobs. Translation makes use of the new OpenAI Chat API and chunks the generated VTT file into smaller parts to fit them into the prompt context limit.

Related Comparisons

Kaldi and SpeechPy can be primarily classified as "Speech Recognition" tools.

Kaldi vs SpeechPy