MIT: AI agents lack safety standards, new index finds

AI agents are operating without safety standards

AI agents are becoming more common and capable, but there is no consensus or standard on how they should behave. That's the conclusion of MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL) in its newly released 2025 AI Agent Index.

The index analyzes 30 machine learning models that can take actions online through software. These range from chat applications with tools to browser-based and enterprise workflow agents.

Developers are not sharing safety data

The accompanying research paper states that key aspects of agent development "remain opaque." The analysis found that developers talk more about product features than safety.

Of the 30 agents studied, 25 provide no details about safety testing and 23 offer no third-party testing data. Only four of the 13 agents with "frontier levels of autonomy" disclose any safety evaluations.

"Of the 13 agents exhibiting frontier levels of autonomy, only four disclose any agentic safety evaluations," the researchers wrote. Those four are ChatGPT Agent, OpenAI Codex, Claude Code, and Gemini 2.5 Computer Use.

A handful of companies and models dominate

The ecosystem is concentrated. Most agents are harnesses or wrappers for foundation models from just a few companies.

Anthropic
Google
OpenAI

This creates a series of dependencies that are difficult to evaluate because "no single entity is responsible," according to the MIT researchers. The market is also geographically concentrated.

Thirteen of the evaluated agents were created by Delaware-incorporated companies. Five come from China-incorporated organizations, and four have other origins (Germany, Norway, Cayman Islands).

Agents are ignoring established web rules

The research highlights that AI agents are already challenging established web protocols. The paper notes their tendency to ignore the Robot Exclusion Protocol, which uses robots.txt files to signal that a website does not consent to being scraped.

This suggests traditional methods to control web crawlers may no longer be sufficient for autonomous agents. The issue is timely, as the release of platforms like OpenClaw and Moltbook shows the community is racing ahead without behavioral rules.

The broader context of agent deployment

The MIT report arrives as the industry grapples with understanding how agents are actually used. On Wednesday, Anthropic published its own analysis focused on the spectrum of agent autonomy, from email triage to cyber espionage.

"AI agents are here, and already they're being deployed across contexts that vary widely in consequence," Anthropic said. The potential economic impact is significant; consultancy McKinsey estimates AI agents could add $2.9 trillion to the US economy by 2030.

However, practical results have been mixed. Enterprises are not yet seeing major returns on AI investments, and research last year found agents could only complete about a third of multi-step office tasks—though models have improved since.

What the 2025 Index examined

The 2025 AI Agent Index is smaller but more in-depth than its 2024 predecessor. It analyzes 30 agents across six categories, with 45 annotation fields for each listed agent on its website.

Legal
Technical capabilities
Autonomy & control
Ecosystem interaction
Evaluation
Safety

Twenty-four of the 30 agents were released or had major updates in the 2024-2025 period. The paper's authors include researchers from the University of Cambridge, MIT, Harvard Law School, and Stanford University.

The overall finding is clear: agent makers are revealing too little safety information as they build on a foundation controlled by a few dominant companies.

AI agents are operating without safety standards

The index analyzes 30 machine learning models that can take actions online through software. These range from chat applications with tools to browser-based and enterprise workflow agents.

Developers are not sharing safety data

The accompanying research paper states that key aspects of agent development "remain opaque." The analysis found that developers talk more about product features than safety.

A handful of companies and models dominate

The ecosystem is concentrated. Most agents are harnesses or wrappers for foundation models from just a few companies.

Anthropic
Google
OpenAI

This creates a series of dependencies that are difficult to evaluate because "no single entity is responsible," according to the MIT researchers. The market is also geographically concentrated.

Thirteen of the evaluated agents were created by Delaware-incorporated companies. Five come from China-incorporated organizations, and four have other origins (Germany, Norway, Cayman Islands).

Agents are ignoring established web rules

The broader context of agent deployment

What the 2025 Index examined

The 2025 AI Agent Index is smaller but more in-depth than its 2024 predecessor. It analyzes 30 agents across six categories, with 45 annotation fields for each listed agent on its website.

Legal
Technical capabilities
Autonomy & control
Ecosystem interaction
Evaluation
Safety

The overall finding is clear: agent makers are revealing too little safety information as they build on a foundation controlled by a few dominant companies.

MIT: AI agents lack safety standards, new index finds

AI agents are operating without safety standards

Developers are not sharing safety data

A handful of companies and models dominate

Agents are ignoring established web rules

The broader context of agent deployment

What the 2025 Index examined

Related Articles

India’s top telco tackles AI with $110 billion build plan and proven fast market dominance playbook

Playing Tetris reduces trauma flashbacks, study finds

Beef jerky tops munchies study as preferred snack for cannabis users

Stay in the loop

MIT: AI agents lack safety standards, new index finds

AI agents are operating without safety standards

Developers are not sharing safety data

A handful of companies and models dominate

Agents are ignoring established web rules

The broader context of agent deployment

What the 2025 Index examined

Related Articles

India’s top telco tackles AI with $110 billion build plan and proven fast market dominance playbook

Playing Tetris reduces trauma flashbacks, study finds

Beef jerky tops munchies study as preferred snack for cannabis users

Stay in the loop

Related Articles

India’s top telco tackles AI with $110 billion build plan and proven fast market dominance playbook
Feb 20, 20263 min read

Playing Tetris reduces trauma flashbacks, study finds
Feb 20, 20262 min read

Beef jerky tops munchies study as preferred snack for cannabis users
Feb 20, 20262 min read