Analysis updated 2026-06-24
Train a custom acoustic model for a low-resource language with your own dataset
Build a speaker verification system that confirms who is calling a hotline
Reproduce a published ASR research result using one of the example recipes in egs
| kaldi-asr/kaldi | tteck/proxmox | cisofy/lynis | |
|---|---|---|---|
| Stars | 15,391 | 15,174 | 15,644 |
| Language | Shell | Shell | Shell |
| Setup difficulty | hard | moderate | easy |
| Complexity | 5/5 | 2/5 | 2/5 |
| Audience | researcher | ops devops | ops devops |
Figures from each repo's GitHub metadata at analysis time.
Compile from source on Linux or macOS, CUDA recommended for training and the egs recipes need large datasets.
Kaldi is a speech recognition toolkit, software that converts spoken audio into text. It is aimed at researchers and engineers working on automatic speech recognition (ASR) problems and is one of the established toolkits in the field for building and experimenting with speech recognition systems. The project also covers speaker identification and speaker verification, which involves determining who is speaking rather than what they said. The toolkit is written primarily in C++ and is designed to run on UNIX-based systems including various Linux distributions, macOS (Darwin), and Cygwin, with separate Windows installation instructions also available. It can take advantage of CUDA-capable GPUs for faster processing. Kaldi includes example system builds (called "egs") that let you run complete speech recognition pipelines on standard datasets to get started. It supports cross-compilation to other platforms including Android and Web Assembly (for in-browser execution using the emscripten toolchain). The project provides documentation on its own website covering both usage and the underlying techniques, along with a Doxygen code reference for developers. Community support is available through mailing lists for both users and developers. Contributors are expected to follow the Google C++ Style Guide with a few project-specific exceptions noted in the documentation.
Established speech recognition toolkit in C++ that converts spoken audio into text. Also handles speaker identification and verification, with GPU support.
Mainly Shell. The stack also includes C++, Shell, CUDA.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.