NeuroAI for AI Safety

November 27, 2024

Basis contributed to a new technical roadmap, "NeuroAI for AI Safety," from Amaranth Foundation. The roadmap aims to make AI systems safer by understanding and implementing the brain's approach to intelligent behavior.

Paper

Seven ways neuroscience can make AI safer. Reproduced from [1].

Led by Patrick Mineault of the Amaranth Foundation, this roadmap¹ presents ambitious visions for seven key themes in neuroAI research.

Basis, along with our collaborators Julian Jara-Ettinger and Marcelo Mattar, contributed to two themes (reproduced from ²) that are well-aligned with our own research directions:

Build better cognitive architectures. Based on our understanding of how the brain implements capabilities like theory of mind, causal reasoning, and cooperation, we could build modular, probabilistic and transparent cognitive architectures that better align with human values and intentions.
Reverse-engineer loss functions of the brain. Using functional, structural and behavioral data to determine the loss functions of the brain, we could derive better training objectives for AI systems.

These lines of work are crucial as we scale up AI systems, because they offer a path to systems that inherit the safety-promoting properties of human cognition - like careful exploration of new situations, stable objectives despite changing circumstances, and natural alignment with human values.

Read the complete roadmap at neuroaisafety.com and see lead author Patrick Mineault’s commentary at his Substack, The NeuroAI Archive.

Contributors

Research: Patrick Mineault, Niccolò Zanichelli, Joanne Zichen Peng, Anton Arkhipov, Eli Bingham^B, Julian Jara-Ettinger^†, Emily Mackevicius^B, Adam Marblestone, Marcelo Mattar^†, Andrew Payne, Sophia Sanborn, Karen Schroeder^B, Zenna Tavares^B, Andreas Tolias

^B Basis
^† Basis collaborators

Article: Karen Schroeder

References

Patrick Mineault et al., “NeuroAI for AI Safety,”
https://doi.org/10.48550/arXiv.2411.18526, Nov 2024. ↩︎
NeuroAI for AI Safety: A Differential Path. https://neuroaisafety.com ↩︎