ARC-AGI is a benchmark dataset created to test whether artificial intelligence systems can reason the way humans do. The idea behind it is that many AI systems can score well on tests by memorizing patterns from large amounts of training data, but they struggle when asked to solve genuinely novel problems using only a small number of examples. ARC-AGI tries to measure that kind of flexible reasoning, which the research paper accompanying the project calls general fluid intelligence. Each task in the dataset presents a set of colored grid puzzles. A solver, whether human or AI, sees a few example pairs showing an input grid and its corresponding output grid. From those examples, the solver must figure out the rule being applied and then produce the correct output for a new input grid. The grids use only colors represented as numbers from 0 to 9, and solutions must be exact: every cell in the output must match the expected answer. The dataset contains 800 tasks split evenly between a training set and an evaluation set. The training set is meant for developing and prototyping approaches. The evaluation set is meant to test final performance without using it as feedback during development, to keep results fair and comparable across different systems. The repository also includes a browser-based interface so that people can try the tasks themselves by hand. You open an HTML file in a web browser, load a task file, and use drawing tools to fill in a grid. This lets anyone experience firsthand how the tasks feel before attempting to automate a solution. This repository covers version 1 of the benchmark. A second version exists in a separate repository. The project is associated with research by Francois Chollet, the creator of the Keras deep learning library.
← fchollet on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.