The Abstraction and Reasoning Corpus (ARC) is a dataset that measures general fluid intelligence in AI systems. It consists of tasks where the AI must infer a pattern from a few examples and apply it to new situations.
Each task contains:
- Training examples showing input-output pairs that demonstrate the pattern
- A test input where the AI must predict the correct output
- The ground truth test output for evaluation
This page showcases different transduction / induction models in attempting to solve the ARC validation set. For each task, models generate multiple candidate solutions, which are ranked based on various strategies including test-time fine-tuning and reranking approaches.
The visualization allows you to:
- Compare different model variants and their performance
- View training examples and test cases
- Examine candidate solutions generated by the models
- Track success rates and solution rankings