Towards Real-World Industrial-Scale Verification: LLM-Driven Theorem Proving on seL4

Researchers use AI to verify critical computer code

A team of researchers has used large language models to formally verify a significant portion of the seL4 microkernel, a critical piece of security-focused software. The work, detailed in a paper posted to arXiv, represents a major step toward using AI for real-world, industrial-scale software verification. The preprint is dated February 9, 2026.

Formal verification is the process of using mathematical proofs to guarantee a piece of code is free of bugs and behaves exactly as specified. It is considered the gold standard for safety-critical systems but is notoriously time-consuming and requires immense expertise. The team, led by Jianyu Zhang and seven other authors, aimed to automate this complex process.

How AI cracked a decades-old kernel

The researchers targeted seL4, a microkernel renowned for being the world's first operating-system kernel with a complete, machine-checked formal proof of correctness. That original verification, completed over a decade ago, took an estimated 20 person-years of effort. The new AI-driven approach tackled a core component of this kernel: its virtual memory system.

Their system, an LLM-based agent, was tasked with generating the intricate mathematical proofs required for verification. The agent works by iteratively proposing proof steps and checking them against the kernel's formal specifications. When it hits a dead end, it learns from the failure and tries a new tactic.

The results were significant. The AI agent managed to autonomously prove 14,886 verification conditions for the kernel's virtual memory subsystem. Perhaps more impressively, it successfully adapted proofs that were written for one hardware architecture (ARMv8) to work on another (RISC-V), a process known as proof transfer.

The promise and limitations of automated verification

This demonstration suggests AI could drastically reduce the cost and expertise barrier for formally verifying critical software, from aerospace systems to medical devices. "Our work represents a step towards real-world industrial-scale verification," the authors state in the paper titled "Towards Real-World Industrial-Scale Verification: LLM-Driven Theorem Proving on seL4."

However, the technology is not yet fully autonomous. The current process still requires human guidance at key points. The researchers outline several challenges that must be overcome before AI can handle verification end-to-end:

Specification Understanding: LLMs must better comprehend complex, formal specifications.
Proof Strategy: They need to learn higher-level proof strategies, not just step-by-step tactics.
Error Interpretation: The models must improve at interpreting feedback from failed proof attempts to guide subsequent actions.

A new tool for a critical field

The successful verification of seL4's virtual memory system is a powerful proof-of-concept. It shows that LLMs can navigate the immense complexity of a proven, real-world codebase and contribute meaningful work. This is not about replacing expert verification engineers but augmenting them with a powerful new assistant.

The ability to transfer proofs between hardware architectures is particularly promising for the industry, where code often needs to run reliably on multiple platforms. As AI models continue to improve, their role in building and certifying the fault-tolerant software that underpins modern society is likely to grow exponentially.

Towards Real-World Industrial-Scale Verification: LLM-Driven Theorem Proving on seL4

Researchers use AI to verify critical computer code

How AI cracked a decades-old kernel

The promise and limitations of automated verification

A new tool for a critical field

Related Articles

HackerOS is what a Linux enthusiast’s OS should be

Anthropic’s new Claude Sonnet 4.6 promises Opus-level coding at Sonnet pricing

Stay in the loop

Rising identity complexity: How CISOs can prevent it from becoming an attacker’s roadmap

Related Articles

HackerOS is what a Linux enthusiast’s OS should be
Feb 20, 20262 min read

Anthropic’s new Claude Sonnet 4.6 promises Opus-level coding at Sonnet pricing
Feb 20, 20262 min read

Rising identity complexity: How CISOs can prevent it from becoming an attacker’s roadmap

Researchers use AI to verify critical computer code

How AI cracked a decades-old kernel

The promise and limitations of automated verification

A new tool for a critical field

Related Articles

HackerOS is what a Linux enthusiast&#8217;s OS should be

Anthropic&#8217;s new Claude Sonnet 4.6 promises Opus-level coding at Sonnet pricing

Stay in the loop

Rising identity complexity: How CISOs can prevent it from becoming an attacker’s roadmap

HackerOS is what a Linux enthusiast’s OS should be

Anthropic’s new Claude Sonnet 4.6 promises Opus-level coding at Sonnet pricing