This project lets you recognize spoken Indonesian language, converting audio recordings into text. It's a complete system that trains a speech recognition model from scratch, then provides a web interface where you can record your voice and get it transcribed. The core technical approach uses two industry-standard tools: HTK (a framework for building speech recognition models) and Julius (a decoder that runs recognition in real time). The workflow starts with audio samples and linguistic data. You extract sound features from recordings, build statistical models of how Indonesian phonemes (basic sounds) behave, refine those models through multiple training iterations, and finally package everything into a decoder. The web interface, built with Node.js, lets anyone record audio through their browser and send it to the decoder for transcription. The README is essentially a detailed recipe, a series of command-line steps that someone would follow to rebuild this system from raw Indonesian audio data. It covers data preparation, model training (which involves creating and iterating through 18 different HMM variations), and optional language model integration to improve accuracy. The training process is lengthy and manual, the author leaves notes in Indonesian explaining common errors and workarounds they encountered. Once training is done, the web app captures microphone input, saves it as a WAV file, and pipes it through the decoder to produce text output. This would appeal to researchers building speech systems for underrepresented languages, Indonesian NLP enthusiasts, or anyone curious about how speech-to-text actually works under the hood. The main tradeoff is that it requires significant computational resources and expertise to train from scratch, this isn't a plug-and-play solution, but rather a documented reference implementation showing how to do it for Indonesian.
← anak10thn on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.