hqian-ai/collabbench

★ 16JavaScriptAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((CollabBench))
    What it does
      Benchmarks AI cooperation
      Multi-player game settings
      Scores collaboration quality
    Player profiles
      Efficient collaborator
      Hesitant laggard
      Anxious doubter
      Proactive leader
      Independent loner
    Components
      Profile generator
      Training setup
      Evaluation pipeline
    Tech stack
      JavaScript
      AI judge model

mindmap root((CollabBench)) What it does Benchmarks AI cooperation Multi-player game settings Scores collaboration quality Player profiles Efficient collaborator Hesitant laggard Anxious doubter Proactive leader Independent loner Components Profile generator Training setup Evaluation pipeline Tech stack JavaScript AI judge model

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Evaluate how well an AI agent adapts its behavior when collaborating with different human-like personality types in cooperative games.

USE CASE 2

Train an AI agent to adjust its communication and task strategy based on who it is working with in a multi-player setting.

USE CASE 3

Use the included judging system to automatically score the collaboration quality of AI agents across game sessions.

Tech stack

JavaScriptPython

Getting it running

Difficulty · hard Time to first run · 1day+

Each component has its own subdirectory setup guide, no single quick-start command, expect substantial configuration across multiple parts.

No license details mentioned in the explanation, check the repository for terms.

In plain English

CollabBench is a research benchmark published at ICML 2026 that measures how well AI language model agents cooperate with human-like partners in multi-player games. The research addresses a gap in how AI models are typically evaluated: most tests look at whether a model can answer a question or complete a task alone, but real settings often require working alongside others who have different personalities and habits. The benchmark uses two cooperative game environments as test beds. In these games, an AI agent must work together with another player (simulated with varying behavioral profiles) to complete shared goals. The researchers modeled five distinct player types the AI might encounter: an efficient collaborator, a hesitant laggard, an anxious doubter, a proactive leader, and an independent loner. Each profile was derived from recorded game behavior by real players. The framework has three main components. The first is a system that generates realistic simulated player profiles from recorded game data. The second is a training setup that teaches the AI agent to adapt its communication and task-taking behavior based on who it is working with. The third is an evaluation pipeline that collects game session data and scores the AI's collaboration quality using another AI model as a judge. The repository provides code for running the benchmark in both game environments, named CWAH-MultiPlayer and Cook-MultiPlayer, along with the training code for the collaborative agents and the judging system. Each component lives in its own subdirectory with its own setup instructions. This is a research artifact that requires following subdirectory-level setup guides rather than a single quick-start command. It was developed by researchers at East China Normal University, Shanghai Innovation Institute, and Tencent.

Copy-paste prompts

Prompt 1

I want to run CollabBench to test an AI agent's cooperation skills in the CWAH-MultiPlayer environment. What are the setup steps and how do I launch a benchmark run?

Prompt 2

How does CollabBench generate the five simulated player profiles from recorded game data, and how do I customize or add new profiles?

Prompt 3

How do I use CollabBench's AI judging system to score the collaboration quality of an agent across recorded game sessions?

Prompt 4

How do I train a new collaborative agent using CollabBench's training code for the Cook-MultiPlayer environment?

Open on GitHub → Explain another repo

← hqian-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.