Model Evaluation & Threat Research (METR)
METR (Model Evaluation & Threat Research) is a nonprofit research institute based in Berkeley, California, that develops and deploys evaluations to measure dangerous autonomous capabilities of frontier AI models. Founded by Beth Barnes and spun out of the Alignment Research Center in 2023, METR works with leading AI companies including OpenAI and Anthropic to conduct pre-deployment model evaluations. The organization focuses on assessing AI systems' ability to perform long-horizon agentic tasks, accelerate AI research and development, and carry out concerning activities like autonomous replication and cyberattacks. METR maintains independence by not accepting compensation from AI companies, though it utilizes compute credits they provide.
METR (Model Evaluation & Threat Research) is a nonprofit research institute based in Berkeley, California, that develops and deploys evaluations to measure dangerous autonomous capabilities of frontier AI models. Founded by Beth Barnes and spun out of the Alignment Research Center in 2023, METR works with leading AI companies including OpenAI and Anthropic to conduct pre-deployment model evaluations. The organization focuses on assessing AI systems' ability to perform long-horizon agentic tasks, accelerate AI research and development, and carry out concerning activities like autonomous replication and cyberattacks. METR maintains independence by not accepting compensation from AI companies, though it utilizes compute credits they provide.
Funding Details
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
- Fiscal Sponsor
- -
Theory of Change
METR's theory of change is that independent, scientifically rigorous evaluation of frontier AI capabilities is essential for informed decision-making about AI development. By developing and deploying standardized methods to measure dangerous autonomous capabilities before AI systems are released, METR enables AI developers, governments, and policymakers to understand risks and implement appropriate safeguards. Their work creates an evidence base that informs responsible scaling policies, government regulation, and voluntary commitments by AI labs. By making evaluation tools open-source and partnering with safety institutes worldwide, METR aims to establish a global infrastructure for AI safety testing that scales with the pace of AI development, ensuring humanity is informed before transformative AI systems are deployed.
Grants Received
from Survival and Flourishing Fund
from Survival and Flourishing Fund
from Survival and Flourishing Fund
Projects– no linked projects
People– no linked people
Discussion
Sign in to join the discussion.
Key risk: METR’s impact depends on labs and governments actually using its evaluations to constrain deployment, and there’s a live risk that current benchmarks (e.g., HCAST/RE‑Bench) don’t robustly capture deception/power‑seeking failure modes and get superseded by in‑house or overlapping efforts (e.g., AISI), limiting counterfactual x‑risk reduction.
Details
- Last Updated
- Apr 2, 2026, 10:10 PM UTC
- Created
- Mar 18, 2026, 11:18 PM UTC
Case for funding: Funding METR sustains a uniquely independent, technically credible evaluator that has already shaped pre-deployment testing for GPT‑4/5 and Claude, built widely adopted open infrastructure (Vivaria, RE‑Bench, HCAST), and partnered with AISI/Project Canary—providing regulators and labs with decision-relevant measurements to gate scaling of dangerous capabilities.