NeuroAI for AI Safety
November 27, 2024
Led by Patrick Mineault of the Amaranth Foundation, this roadmap1 presents ambitious visions for seven key themes in neuroAI research.
Basis, along with our collaborators Julian Jara-Ettinger and Marcelo Mattar, contributed to two themes (reproduced from 2) that are well-aligned with our own research directions:
- Build better cognitive architectures. Based on our understanding of how the brain implements capabilities like theory of mind, causal reasoning, and cooperation, we could build modular, probabilistic and transparent cognitive architectures that better align with human values and intentions.
- Reverse-engineer loss functions of the brain. Using functional, structural and behavioral data to determine the loss functions of the brain, we could derive better training objectives for AI systems.
These lines of work are crucial as we scale up AI systems, because they offer a path to systems that inherit the safety-promoting properties of human cognition - like careful exploration of new situations, stable objectives despite changing circumstances, and natural alignment with human values.
Read the complete roadmap at neuroaisafety.com and see lead author Patrick Mineault’s commentary at his Substack, The NeuroAI Archive.
Contributors
Research: Patrick Mineault, Niccolò Zanichelli, Joanne Zichen Peng, Anton Arkhipov, Eli BinghamB, Julian Jara-Ettinger†, Emily MackeviciusB, Adam Marblestone, Marcelo Mattar†, Andrew Payne, Sophia Sanborn, Karen SchroederB, Zenna TavaresB, Andreas Tolias
B Basis
† Basis collaborators
Article: Karen Schroeder
References
Patrick Mineault et al., “NeuroAI for AI Safety,”
https://doi.org/10.48550/arXiv.2411.18526, Nov 2024. ↩︎NeuroAI for AI Safety: A Differential Path. https://neuroaisafety.com ↩︎