Evaluate how well an AI agent adapts its behavior when collaborating with different human-like personality types in cooperative games.
Train an AI agent to adjust its communication and task strategy based on who it is working with in a multi-player setting.
Use the included judging system to automatically score the collaboration quality of AI agents across game sessions.
Each component has its own subdirectory setup guide, no single quick-start command, expect substantial configuration across multiple parts.
CollabBench is a research benchmark published at ICML 2026 that measures how well AI language model agents cooperate with human-like partners in multi-player games. The research addresses a gap in how AI models are typically evaluated: most tests look at whether a model can answer a question or complete a task alone, but real settings often require working alongside others who have different personalities and habits. The benchmark uses two cooperative game environments as test beds. In these games, an AI agent must work together with another player (simulated with varying behavioral profiles) to complete shared goals. The researchers modeled five distinct player types the AI might encounter: an efficient collaborator, a hesitant laggard, an anxious doubter, a proactive leader, and an independent loner. Each profile was derived from recorded game behavior by real players. The framework has three main components. The first is a system that generates realistic simulated player profiles from recorded game data. The second is a training setup that teaches the AI agent to adapt its communication and task-taking behavior based on who it is working with. The third is an evaluation pipeline that collects game session data and scores the AI's collaboration quality using another AI model as a judge. The repository provides code for running the benchmark in both game environments, named CWAH-MultiPlayer and Cook-MultiPlayer, along with the training code for the collaborative agents and the judging system. Each component lives in its own subdirectory with its own setup instructions. This is a research artifact that requires following subdirectory-level setup guides rather than a single quick-start command. It was developed by researchers at East China Normal University, Shanghai Innovation Institute, and Tencent.
← hqian-ai on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.