shishirpatil/gorilla

★ 12,861PythonAudience · researcherComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((Gorilla))
    Models
      Fine-tuned LLM
      OpenFunctions V2
    Benchmarks
      BFCL leaderboard
      APIBench dataset
    API coverage
      Hugging Face
      PyTorch Hub
      REST APIs
    Tools
      GoEx runtime
      Undo actions
      Multi-turn agents

mindmap root((Gorilla)) Models Fine-tuned LLM OpenFunctions V2 Benchmarks BFCL leaderboard APIBench dataset API coverage Hugging Face PyTorch Hub REST APIs Tools GoEx runtime Undo actions Multi-turn agents

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Benchmark your AI model's function-calling accuracy using the Berkeley Function Calling Leaderboard.

USE CASE 2

Use Gorilla OpenFunctions V2 as a drop-in replacement for OpenAI function calling in Python or JavaScript apps.

USE CASE 3

Run LLM-generated actions safely with GoEx, which supports undoing unintended operations.

USE CASE 4

Train a model on APIBench to produce accurate API calls from plain-language requests.

Tech stack

PythonPyTorchREST API

Getting it running

Difficulty · hard Time to first run · 1h+

Running or fine-tuning models requires GPU resources or API keys for hosted inference providers.

Apache 2.0, use freely for any purpose, including commercial, as long as you keep the license and copyright notice.

In plain English

Gorilla is a research project from UC Berkeley focused on training and evaluating large language models (LLMs) to call external APIs accurately. When you ask an AI assistant to use a tool or service, it needs to produce a correctly formatted function call with the right arguments. Gorilla studies how to make that work reliably across thousands of different APIs. The repository contains several interconnected components. The original Gorilla model is a fine-tuned language model trained on a dataset called APIBench, which covers more than 1,600 APIs from sources like Hugging Face, PyTorch Hub, and TensorFlow Hub. The model takes a plain-language request and produces the correct API call, including proper argument names and types, with fewer errors than general-purpose models at the time of release. The Berkeley Function Calling Leaderboard (BFCL) is a benchmark for ranking AI models on their ability to call functions correctly. It has gone through several versions, progressing from single function calls to multi-turn conversations, multi-step workflows, and a V4 that tests tool use in real agent settings including web search with multi-hop reasoning and memory management. The leaderboard is publicly available and tracks many commercial and open-source models. Gorilla OpenFunctions V2 is a model designed as a drop-in replacement for OpenAI function calling, with support for Python, Java, JavaScript, and REST APIs. It can execute multiple functions in parallel and includes logic to detect when a function call is not actually relevant, reducing unnecessary invocations. GoEx is a separate component that acts as a runtime for executing actions an LLM generates, with built-in support for undoing actions and limiting damage from unintended operations. The project is licensed under Apache 2.0.

Copy-paste prompts

Prompt 1

Using Gorilla OpenFunctions V2, replace OpenAI function calling in my Python app to route a user's plain-language request to the correct REST API endpoint with proper arguments.

Prompt 2

How do I submit my model to the Berkeley Function Calling Leaderboard and interpret the benchmark results by category?

Prompt 3

Using GoEx from the Gorilla project, show me how to execute an LLM-generated action and undo it if something goes wrong.

Prompt 4

Set up the Gorilla model to take a natural language request and produce the correct Hugging Face API call with the right argument names and types.

Open on GitHub → Explain another repo

← shishirpatil on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.