explaingit

openvenues/libpostal

4,795CAudience · developerComplexity · 4/5Setup · hard

TLDR

A C library that parses and normalizes street addresses from around the world, turning messy human-written text into clean, standardized forms that computers can reliably compare and search.

Mindmap

mindmap
  root((libpostal))
    What it does
      Address normalization
      Address parsing
      Multi-language support
    Inputs
      Raw address strings
      Any script or language
    Outputs
      Standardized address forms
      Parsed address components
    Use Cases
      Deduplication pipelines
      Search indexing
      Pre-geocoding step
    Tech Stack
      C core library
      Python Ruby Go Java
      OpenStreetMap training data
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Deduplicate address records from different data sources by normalizing them into a consistent format before comparing.

USE CASE 2

Preprocess user-entered addresses before passing them to a geocoding service to improve match accuracy.

USE CASE 3

Build a search index over address data that handles abbreviations, local conventions, and non-Latin scripts.

USE CASE 4

Parse raw address strings into structured fields like house number, street name, and postal code for database storage.

Tech stack

CPythonRubyGoJavaPHPNode.jsOpenStreetMap

Getting it running

Difficulty · hard Time to first run · 1h+

Requires compiling C source and downloading large OpenStreetMap training data files before the library is ready.

No license information is mentioned in the explanation.

In plain English

Libpostal is a software library that takes street addresses written the way people write them and converts them into clean, standardized forms that computers can reliably compare and search. A human might write the same address in a dozen different ways: using abbreviations, local conventions, different word orders, or different scripts. This library tries to understand all of those variations and produce consistent output across countries and languages. It does two related things. The first is normalization: taking an address string and generating the set of standard forms it could reasonably map to. The second is parsing: breaking an address into its component parts, such as the house number, street name, city, state, postal code, and country. Both operations are useful when building systems that need to match or deduplicate addresses, index them for search, or compare records from different sources. The library is trained on data from OpenStreetMap, a large open-source map of the world, which gives it broad coverage across many countries and writing systems. It handles addresses written in languages that use scripts other than the Latin alphabet, right-to-left text, and address formats that differ significantly from what English-speaking developers might expect. It is not a geocoder, meaning it does not convert addresses to latitude and longitude coordinates, but it is intended to be used as a preprocessing step before sending addresses to a geocoding service. The core library is written in C for performance. Official language bindings exist for Python, Ruby, Go, Java, PHP, and Node.js, making it accessible from most common development environments without writing C code directly. This is a foundational tool for any application that deals with location data at scale, such as delivery services, maps, or data deduplication pipelines. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
How do I use libpostal in Python to normalize a list of messy US street addresses into a standard form?
Prompt 2
Show me how to parse an address string with libpostal and extract the house number, street name, and city as separate fields.
Prompt 3
I'm building a deduplication pipeline for address records from multiple countries, how do I integrate libpostal to handle non-Latin scripts?
Prompt 4
How do I use libpostal as a preprocessing step before sending addresses to a geocoding API to improve accuracy?
Prompt 5
Walk me through setting up libpostal on Ubuntu and running my first address normalization from the command line.
Open on GitHub → Explain another repo

← openvenues on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.