Deduplicate address records from different data sources by normalizing them into a consistent format before comparing.
Preprocess user-entered addresses before passing them to a geocoding service to improve match accuracy.
Build a search index over address data that handles abbreviations, local conventions, and non-Latin scripts.
Parse raw address strings into structured fields like house number, street name, and postal code for database storage.
Requires compiling C source and downloading large OpenStreetMap training data files before the library is ready.
Libpostal is a software library that takes street addresses written the way people write them and converts them into clean, standardized forms that computers can reliably compare and search. A human might write the same address in a dozen different ways: using abbreviations, local conventions, different word orders, or different scripts. This library tries to understand all of those variations and produce consistent output across countries and languages. It does two related things. The first is normalization: taking an address string and generating the set of standard forms it could reasonably map to. The second is parsing: breaking an address into its component parts, such as the house number, street name, city, state, postal code, and country. Both operations are useful when building systems that need to match or deduplicate addresses, index them for search, or compare records from different sources. The library is trained on data from OpenStreetMap, a large open-source map of the world, which gives it broad coverage across many countries and writing systems. It handles addresses written in languages that use scripts other than the Latin alphabet, right-to-left text, and address formats that differ significantly from what English-speaking developers might expect. It is not a geocoder, meaning it does not convert addresses to latitude and longitude coordinates, but it is intended to be used as a preprocessing step before sending addresses to a geocoding service. The core library is written in C for performance. Official language bindings exist for Python, Ruby, Go, Java, PHP, and Node.js, making it accessible from most common development environments without writing C code directly. This is a foundational tool for any application that deals with location data at scale, such as delivery services, maps, or data deduplication pipelines. The full README is longer than what was shown.
← openvenues on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.