Analysis updated 2026-05-18
Clean and structure addresses collected from web forms or legacy databases for shipping or logistics systems.
Build an address validation step in an e-commerce checkout flow that standardizes Indian address strings.
Extract district and state from a batch of raw address strings for geographic analysis or reporting.
| innerkorehq/indian-address-parser | a-bissell/unleash-lite | abhiinnovates/whatsapp-hr-assistant | |
|---|---|---|---|
| Stars | 1 | 1 | 1 |
| Language | Python | Python | Python |
| Setup difficulty | easy | hard | hard |
| Complexity | 2/5 | 4/5 | 3/5 |
| Audience | developer | researcher | developer |
Figures from each repo's GitHub metadata at analysis time.
This Python package takes a raw Indian address written as a single block of text and breaks it down into up to 13 separate fields: house number, house name, point of interest, street, sub-locality levels, village, sub-district, district, city, state, and PIN code. You feed in a messy string like "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI, Kamrup Unclassified AS 781029" and get back a clean dictionary with each piece in its own named slot. Any field that was not present in the original string comes back as null. The package ships with three different AI models you can choose from, all downloaded automatically from Hugging Face the first time you use them. The default, called tinybert, is a small model with around 14 million parameters that works by labeling each word as belonging to a particular field. It is the fastest and lightest option, with roughly 79% accuracy across fields. The t5 option uses a 77-million-parameter model that generates JSON directly, it is a couple of points more accurate but noticeably slower. The qwen option uses a 596-million-parameter model with the highest accuracy, around 82%, and is the best choice when precision matters most, particularly for harder-to-detect fields like point of interest or sub-locality. You can use the library from Python code or from the command line. In Python you create an AddressParser object, optionally passing a backend name, and then call parse on a single string or parse_batch on a list. From the terminal, you pass an address as an argument, pipe addresses from a file through standard input, or point it at a text file and ask for JSONL output. The package contains only inference code, not the model weights themselves. The weights live on Hugging Face and download on first use, so there is nothing large bundled in the pip install. Benchmark comparisons in the repository show all three backends outperforming a comparable open model from the logistics company Shiprocket on nine shared address fields. The license is Apache 2.0, matching the base models the fine-tunes were built on.
A Python package that takes a messy Indian address string and splits it into 13 labeled fields like house number, street, district, city, and PIN code, using one of three downloadable AI models.
Mainly Python. The stack also includes Python, Hugging Face Transformers, TinyBERT.
Use freely for any purpose including commercial projects as long as you keep the copyright notice.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.