The Alignment Research Center (ARC) is a nonprofit research organization founded in 2021 by Paul Christiano, dedicated to aligning future machine learning systems with human interests. ARC pursues theoretical research on producing formal mechanistic explanations of neural network behavior, combining ideas from mechanistic interpretability and formal verification. Their work centers on intent alignment, ensuring ML systems are genuinely helpful and honest rather than potentially deceptive. Key research areas include heuristic arguments, Eliciting Latent Knowledge (ELK), mechanistic anomaly detection, and low probability estimation. ARC previously housed ARC Evals, which spun out as the independent nonprofit METR in late 2023.
The Alignment Research Center (ARC) is a nonprofit research organization founded in 2021 by Paul Christiano, dedicated to aligning future machine learning systems with human interests. ARC pursues theoretical research on producing formal mechanistic explanations of neural network behavior, combining ideas from mechanistic interpretability and formal verification. Their work centers on intent alignment, ensuring ML systems are genuinely helpful and honest rather than potentially deceptive. Key research areas include heuristic arguments, Eliciting Latent Knowledge (ELK), mechanistic anomaly detection, and low probability estimation. ARC previously housed ARC Evals, which spun out as the independent nonprofit METR in late 2023.
Funding Details
- Annual Budget
- $9,050,000
- Monthly Burn Rate
- $754,167
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
- Fiscal Sponsor
- -
Theory of Change
ARC believes that as ML systems become more capable, current alignment approaches may fail to scale, potentially leading to systems that pursue goals misaligned with human interests. Their theory of change centers on developing rigorous theoretical foundations for alignment before superintelligent systems arrive. By creating formal mechanistic explanations of neural network behavior, combining ideas from mechanistic interpretability and formal verification into heuristic arguments, ARC aims to enable reliable detection of when AI systems might behave in dangerous or deceptive ways. This theoretical groundwork is intended to inform practical alignment techniques that can be applied by AI labs building frontier models, ensuring that powerful AI systems remain genuinely helpful and honest rather than merely appearing aligned.
Grants Received
from Survival and Flourishing Fund
from Open Philanthropy
from Long-Term Future Fund
from Survival and Flourishing Fund
from Open Philanthropy
Projects– no linked projects
People– no linked people
Discussion
Sign in to join the discussion.
Key risk: Their theory-heavy program may not translate into actionable, lab-deployable techniques fast enough for frontier timelines—particularly post-ARC Evals spinout and leadership transition from Paul to Jacob—making the counterfactual impact of additional funding uncertain.
Details
- Last Updated
- Apr 2, 2026, 10:00 PM UTC
- Created
- Mar 18, 2026, 11:18 PM UTC
Case for funding: ARC is one of the few teams pursuing machine-checkable heuristic arguments that fuse mechanistic interpretability with formal verification—a plausible path to scalable alignment guarantees and deception detection—supported by a strong track record (ELK, builder–breaker stress tests, GPT-4 power-seeking evals via ARC Evals, rare-failure probability estimation) that has influenced labs and the field.