explaingit

innerkorehq/indian-address-parser

Analysis updated 2026-05-18

1PythonAudience · developerComplexity · 2/5LicenseSetup · easy

TLDR

A Python package that takes a messy Indian address string and splits it into 13 labeled fields like house number, street, district, city, and PIN code, using one of three downloadable AI models.

Mindmap

mindmap
  root((indian-address-parser))
    What it does
      Parse raw address text
      13 structured fields
      Null for missing fields
    Model backends
      TinyBERT default
      Flan-T5 moderate
      Qwen3 most accurate
    Usage
      Python API
      CLI tool
      Batch processing
    Setup
      pip install
      Auto-downloads weights
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Clean and structure addresses collected from web forms or legacy databases for shipping or logistics systems.

USE CASE 2

Build an address validation step in an e-commerce checkout flow that standardizes Indian address strings.

USE CASE 3

Extract district and state from a batch of raw address strings for geographic analysis or reporting.

What is it built with?

PythonHugging Face TransformersTinyBERTFlan-T5Qwen3PEFT

How does it compare?

innerkorehq/indian-address-parsera-bissell/unleash-liteabhiinnovates/whatsapp-hr-assistant
Stars111
LanguagePythonPythonPython
Setup difficultyeasyhardhard
Complexity2/54/53/5
Audiencedeveloperresearcherdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min
Use freely for any purpose including commercial projects as long as you keep the copyright notice.

In plain English

This Python package takes a raw Indian address written as a single block of text and breaks it down into up to 13 separate fields: house number, house name, point of interest, street, sub-locality levels, village, sub-district, district, city, state, and PIN code. You feed in a messy string like "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI, Kamrup Unclassified AS 781029" and get back a clean dictionary with each piece in its own named slot. Any field that was not present in the original string comes back as null. The package ships with three different AI models you can choose from, all downloaded automatically from Hugging Face the first time you use them. The default, called tinybert, is a small model with around 14 million parameters that works by labeling each word as belonging to a particular field. It is the fastest and lightest option, with roughly 79% accuracy across fields. The t5 option uses a 77-million-parameter model that generates JSON directly, it is a couple of points more accurate but noticeably slower. The qwen option uses a 596-million-parameter model with the highest accuracy, around 82%, and is the best choice when precision matters most, particularly for harder-to-detect fields like point of interest or sub-locality. You can use the library from Python code or from the command line. In Python you create an AddressParser object, optionally passing a backend name, and then call parse on a single string or parse_batch on a list. From the terminal, you pass an address as an argument, pipe addresses from a file through standard input, or point it at a text file and ask for JSONL output. The package contains only inference code, not the model weights themselves. The weights live on Hugging Face and download on first use, so there is nothing large bundled in the pip install. Benchmark comparisons in the repository show all three backends outperforming a comparable open model from the logistics company Shiprocket on nine shared address fields. The license is Apache 2.0, matching the base models the fine-tunes were built on.

Copy-paste prompts

Prompt 1
Using the indian-address-parser Python package, write a script that reads a CSV of raw Indian addresses and outputs a new CSV with each of the 13 structured fields as separate columns.
Prompt 2
How do I use indian-address-parser to parse 10,000 addresses efficiently? Show me the parse_batch call and how to handle rows where fields come back null.
Prompt 3
Compare the tinybert and qwen backends in indian-address-parser. When should I use each one, and how do I switch between them in my code?
Prompt 4
Write a FastAPI endpoint that accepts a raw Indian address string and returns the parsed fields as JSON using indian-address-parser.

Frequently asked questions

What is indian-address-parser?

A Python package that takes a messy Indian address string and splits it into 13 labeled fields like house number, street, district, city, and PIN code, using one of three downloadable AI models.

What language is indian-address-parser written in?

Mainly Python. The stack also includes Python, Hugging Face Transformers, TinyBERT.

What license does indian-address-parser use?

Use freely for any purpose including commercial projects as long as you keep the copyright notice.

How hard is indian-address-parser to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is indian-address-parser for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub innerkorehq on gitmyhub

Verify against the repo before relying on details.