explaingit

diqiuzhuanzhuan/openllm-func-call-synthesizer

18PythonAudience · researcherComplexity · 3/5ActiveLicenseSetup · moderate

TLDR

Python toolkit that generates, scores, and exports function-calling training datasets for LLMs in multiple languages, using an MCP server for tool discovery and a deepeval critic to keep only high-scoring rows.

Mindmap

mindmap
  root((func-call-synthesizer))
    Inputs
      Tool list via MCP
      LLM API credentials
      YAML config
    Outputs
      Function call rows
      Critic scores
      JSONL CSV Parquet
      LlamaFactory and verl exports
    Use Cases
      Build function call datasets
      Score generated calls
      Generate multi turn chats
    Tech Stack
      Python
      Typer
      Rich
      Hydra
      MCP
      deepeval

Things people build with this

USE CASE 1

Generate a multilingual function-calling dataset from a list of tools defined on an MCP server.

USE CASE 2

Score synthetic function calls with the deepeval critic and keep only rows above a 0.8 quality threshold.

USE CASE 3

Export training data directly to LlamaFactory and verl formats with optional train/val splits.

USE CASE 4

Run the bin/run_pipeline.sh helper to launch several synthesizer runs in parallel against the same MCP server.

Tech stack

PythonTyperHydraMCPdeepeval

Getting it running

Difficulty · moderate Time to first run · 30min

Needs a reachable MCP server plus API credentials for whichever LLM backend you choose, configured through a .env file or environment variables.

MIT license, free to use, modify, and redistribute as long as the copyright notice is kept.

In plain English

This project is a Python toolkit for building training datasets that teach a language model how to call functions. Large language models often need to take an instruction in plain English and translate it into a structured call to a tool, like search_photos or get_weather. To learn this, they need examples. The toolkit generates those examples in bulk, scores them, and writes them out in formats compatible with common training stacks, including OpenAI-style function calls and LlamaFactory. The pipeline runs in stages, each one configurable through a YAML file. Query generation takes a list of available tools and creates seed prompts in multiple languages (English, Chinese, Japanese, and German in the default config). Function-call synthesis feeds those prompts to a model such as gpt-4o and records the structured call the model produces. A critic stage re-scores every generated call, by default through a deepeval judge, and keeps only rows that score 0.8 or higher. Final exports go to JSONL, CSV, Parquet, or LlamaFactory and verl formats with optional train/val splits. There is also a multi-turn conversation generator for chat-style datasets. A key requirement is an MCP server, which is what the tool uses to discover what functions exist and what arguments they take. The repository ships an example MCP server under examples/mcp_example_sserver/server.py. The synthesizer will fail to run if no MCP server is reachable at the address listed in the config. Installation is from PyPI with pip or uv, or from source with uv sync. You also need API credentials for whichever LLM backend you plan to use, set through environment variables or a .env file. The CLI is built with Typer and Rich, and configuration is handled through Hydra so options can be overridden on the command line, for example to enable only query generation or to switch the output language list. A helper script at bin/run_pipeline.sh can launch several synthesizer runs in parallel. The project is MIT licensed and points to its own Read the Docs site for full documentation.

Copy-paste prompts

Prompt 1
Set up openllm-func-call-synthesizer with the example MCP server under examples/mcp_example_sserver and generate a 1000-row English plus Chinese dataset using gpt-4o.
Prompt 2
Write a new MCP server that exposes three weather tools and point the synthesizer at it to produce a function-calling dataset in JSONL.
Prompt 3
Override the synthesizer's Hydra config from the command line to run only the query-generation stage and skip the critic.
Prompt 4
Replace the deepeval critic with a local judge model and keep the 0.8 score threshold logic intact.
Prompt 5
Export a synthesizer run to LlamaFactory format with a 90/10 train/val split and verify the resulting files load in LlamaFactory.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.