AI agents can't teach themselves new tricks – only people can
Summary
AI agents perform better with human-curated skills, gaining up to 16% in task completion. A study found self-generated skills by AI are ineffective or harmful, showing human expertise is still essential for guiding AI.
AI agents need human help to learn new skills
Artificial intelligence agents are significantly worse at teaching themselves new skills than they are at using human-written instructions, according to a new study. The research, from a team of 40 computer scientists, provides a benchmark for evaluating the "skills" that augment AI agents like Claude Code and Gemini CLI.
AI agents are models that operate in a loop, using a command-line interface to run software and complete tasks. When faced with an unfamiliar job, they can be given a "skill"—a package of reference material, code, and instructions for a specific domain.
SkillsBench tests agent performance
The researchers created a benchmark called SkillsBench to measure how these skills affect performance. They tested seven agent-model setups across 84 different tasks, running a total of 7,308 individual attempts, or "trajectories."
Each task was attempted under three conditions: with no skill, with a human-curated skill, and with a skill the agent tried to generate for itself. The results showed a stark difference in effectiveness.
Agents using curated skills completed tasks 16.2 percent more often on average than those with no skills. The improvement, however, varied widely depending on the knowledge domain.
Curated skills supercharge specialized tasks
The study found human-written skills provided the biggest boost in fields with specialized knowledge that is underrepresented in general training data. The performance gains were massive in some areas.
- Healthcare tasks: +51.9 percentage point improvement
- Manufacturing tasks: +41.9 percentage point improvement
- Mathematics tasks: +6.0 percentage point improvement
- Software engineering tasks: +4.5 percentage point improvement
One example was a flood-risk analysis. Without a skill, agents used the wrong statistical methods and had a pass rate of only 2.9 percent. With a curated skill detailing the correct USGS methodology and code libraries, the pass rate jumped to 80 percent.
Smaller models and simpler skills work best
The research also revealed that less is often more when it comes to skill design. Skills containing only a few focused modules performed better than large, unfocused data dumps.
Curated skills also helped smaller AI models compete with larger ones. For instance, Anthropic's smaller Claude Haiku 4.5 model, when equipped with a skill, achieved a 27.7 percent task completion rate.
This outperformed both the same model without skills (11 percent) and the much larger Claude Opus 4.5 model running without any skills (22 percent).
Agents fail at teaching themselves
The most definitive finding was the failure of self-learning. When agents were directed to generate their own skills for a task, they performed worse than if they had received no guidance at all.
Agents using self-generated skills saw an average decrease in performance of 1.3 percentage points compared to the no-skill baseline. "Self-generated skills provide negligible or negative benefit," the authors state, concluding that effective skills require human-curated expertise.
The study, led by BenchFlow founder Xiangyi Li, is currently a preprint. It suggests that for the foreseeable future, AI agents will remain dependent on human teachers to acquire reliable new capabilities.
Related Articles
Claude Has a Surprisingly Great Way to Add Multiple Appointments to Your Calendar at Once
Claude can now create iCal files from schedules in screenshots or photos, letting you quickly import multiple events into your digital calendar instead of adding them manually.
These Malicious AI Assistants in Chrome Are Stealing User Credentials
Fake AI Chrome extensions like AiFrame, posing as ChatGPT or Gemini, have over 300,000 installs. They steal data via remote iframes. Check and remove suspicious extensions.

Stay in the loop
Get the best AI-curated news delivered to your inbox. No spam, unsubscribe anytime.
