Let non-SQL business users query a shipments database in plain English
Study a safe natural-language-to-SQL pipeline with intent routing and validation
Build a read-only analytics API over a MySQL schema you already have
Demo SQL injection and prompt injection filters on top of an LLM SQL generator
Needs a running MySQL with the three shipping tables loaded and an OpenAI API key before any query works.
This project is a small web service that lets a person ask questions about a shipping database in plain English, and have the answer come back as structured data. The README frames the problem this way: business users often need to look up facts in a database but do not know SQL, the language databases speak. The service translates a question like "top delayed routes" or "most active destination ports" into the matching SQL query, runs it against a MySQL database of shipments and ports, and returns the results. The pipeline behind the scenes has several steps. When a request comes in, an intent classifier first identifies which tables in the database are likely relevant. The schema for those tables is then loaded as context. A large language model (such as one from OpenAI) generates a SQL query based on the question and the schema. The query goes through a validation layer that checks it is safe to run, then it is executed against a read-only connection to the database, and the result is returned as JSON. The database is built around three connected tables. The Shipments table is the central record of movement, with route, cargo type, status, delay, and departure and arrival timestamps. The Ports table stores port names, countries, and regions, and is used to group results by location. The Ships table holds vessel names, types, operator companies (with examples like Maersk, MSC, and COSCO), origin country, and capacity measured in TEU (a standard container unit). The three tables are linked by foreign keys so the API can join them automatically. The README puts emphasis on safety, because letting an AI write database queries is a risk. Only SELECT queries (read-only ones) are allowed, common SQL injection patterns are blocked, prompt injection attempts are filtered, and SQL comments and multi-statement queries are rejected. The database connection itself is read-only as a second line of defence. The tech stack is FastAPI for the web server, Python as the language, MySQL as the database, LangChain to orchestrate the language model calls, the OpenAI API as the model, and SQLAlchemy with Pandas for data handling. The backend is deployed on Render, a hosting platform. The README lists planned future work, including query caching, a visualisation dashboard, SQL explanation support, and streaming responses.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.