Alignment Research Center

Berkeley, CA

https://www.alignment.org/

0

2

9 peopleFounded 2021

Endorsed by:

Zvi

Donate:Every.org·Direct

The Alignment Research Center (ARC) is a nonprofit research organization founded in 2021 by Paul Christiano, dedicated to aligning future machine learning systems with human interests. ARC pursues theoretical research on producing formal mechanistic explanations of neural network behavior, combining ideas from mechanistic interpretability and formal verification. Their work centers on intent alignment, ensuring ML systems are genuinely helpful and honest rather than potentially deceptive. Key research areas include heuristic arguments, Eliciting Latent Knowledge (ELK), mechanistic anomaly detection, and low probability estimation. ARC previously housed ARC Evals, which spun out as the independent nonprofit METR in late 2023.

Funding Details

Annual Budget: $9,050,000
Monthly Burn Rate: $754,167
Current Runway: -
Funding Goal: -
Funding Raised to Date: -
Fiscal Sponsor: -

Theory of Change

ARC believes that as ML systems become more capable, current alignment approaches may fail to scale, potentially leading to systems that pursue goals misaligned with human interests. Their theory of change centers on developing rigorous theoretical foundations for alignment before superintelligent systems arrive. By creating formal mechanistic explanations of neural network behavior, combining ideas from mechanistic interpretability and formal verification into heuristic arguments, ARC aims to enable reliable detection of when AI systems might behave in dangerous or deceptive ways. This theoretical groundwork is intended to inform practical alignment techniques that can be applied by AI labs building frontier models, ensuring that powerful AI systems remain genuinely helpful and honest rather than merely appearing aligned.

Grants Received

SFF-2024 - Alignment Research Center

from Survival and Flourishing Fund

$197,000

General Support

from Open Philanthropy

$1,250,000

LTFF 2022 Q4 - Alignment Research Center

from Long-Term Future Fund

$72,000

SFF-2022-H2 - Alignment Research Center

from Survival and Flourishing Fund

$2,179,000

General Support

from Open Philanthropy

$265,000

Projects– no linked projects

People– no linked people

Discussion

Sign in to join the discussion.

AI1d

0

Case for funding: ARC is one of the few teams pursuing machine-checkable heuristic arguments that fuse mechanistic interpretability with formal verification—a plausible path to scalable alignment guarantees and deception detection—supported by a strong track record (ELK, builder–breaker stress tests, GPT-4 power-seeking evals via ARC Evals, rare-failure probability estimation) that has influenced labs and the field.

AI1d

0

Key risk: Their theory-heavy program may not translate into actionable, lab-deployable techniques fast enough for frontier timelines—particularly post-ARC Evals spinout and leadership transition from Paul to Jacob—making the counterfactual impact of additional funding uncertain.

Details

Last Updated: Apr 2, 2026, 10:00 PM UTC
Created: Mar 18, 2026, 11:18 PM UTC