explaingit

libambu/data-agent

14JavaAudience · developerComplexity · 5/5Setup · hard

TLDR

Data Agent lets you ask data questions in plain text and returns working SQL or Python code plus a Markdown report, powered by a 13-node multi-agent pipeline with PostgreSQL and vector search.

Mindmap

mindmap
  root((Data Agent))
    Input
      Plain-language question
      Human plan approval
    Agent pipeline
      Schema retrieval
      Feasibility check
      Planner node
      SQL sub-agent
      Python sub-agent
      Report sub-agent
    Knowledge base
      Database schema
      Business glossary
      Historical QA pairs
    Output
      SQL queries
      Python analysis
      Markdown reports
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Type a plain-language question about your database and receive a working SQL query and analysis report without writing any code.

USE CASE 2

Test natural-language-to-SQL accuracy on the included BIRD-SQL benchmark dataset with 11 databases and 1,534 questions.

USE CASE 3

Give non-technical team members a chat interface for querying structured business data.

Tech stack

JavaSpring BootVue 3PostgreSQLpgvectorSpring AI

Getting it running

Difficulty · hard Time to first run · 1h+

Requires Docker, Java 17, Maven, Node.js with pnpm, and a DashScope API key for the DeepSeek LLM.

License not specified in this repository.

In plain English

Data Agent is an end-to-end data analytics platform where you type a question in plain language and the system produces working SQL or Python code plus a Markdown analysis report. The project is built in Java and Vue 3, and its core is a network of AI agents that collaborate to answer your question. The pipeline is structured as a 13-node, 4-stage graph. When you submit a question, the system first retrieves relevant database schema information and business terminology from a vector store. It then evaluates whether the question is feasible, plans an execution strategy, and routes the work to specialized sub-agents: one that writes SQL queries, one that runs Python analysis or generates charts, and one that composes the final report. An LLM supervisor coordinates the sub-agents and can loop them if a result looks wrong. The system also includes a human-in-the-loop checkpoint where you can approve or reject the plan before execution proceeds. Data Agent uses a three-layer knowledge system backed by PostgreSQL with the pgvector extension. It stores database schema, business glossary terms, and historical question-answer pairs as vector embeddings, which it retrieves to guide each new query. The BIRD-SQL benchmark dataset (11 databases, 1,534 questions) is included for testing. Setting up the project requires Docker for PostgreSQL plus pgvector, Java 17, Maven, Node.js with pnpm, and a DashScope API key (an Alibaba Cloud service that provides access to the DeepSeek model used here). A Docker Compose file handles the database setup automatically. The frontend runs on Vue 3 with Element Plus and shows a live animated view of the agent graph as each node executes. The project is aimed at teams that want to give non-technical stakeholders a way to query databases by asking questions in plain text.

Copy-paste prompts

Prompt 1
How do I connect Data Agent to my own PostgreSQL database and add its schema to the vector store?
Prompt 2
Walk me through adding a custom business glossary term to the Data Agent knowledge base so the SQL agent understands our internal naming.
Prompt 3
How does the Data Agent supervisor decide whether to send a task to the SQL sub-agent or the Python sub-agent?
Prompt 4
Set up Data Agent with Docker Compose and load the BIRD-SQL dataset so I can test natural-language queries against all 11 included databases.
Open on GitHub → Explain another repo

← libambu on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.