Visualizing the ARM64 Instruction Set (2024)
Summary
Visualized ARM64 instruction set & LFI (Lightweight Fault Isolation) sandbox's legal instructions using Hilbert curves. Custom tools parse ARM spec; patterns reveal instruction classes and LFI's security restrictions.
Mapping the machine code
Researcher Zachary Yedidia has mapped the entire ARM64 instruction set into a 2D visualization using space-filling Hilbert curves. The project translates the complex web of 32-bit integers that govern modern mobile and server processors into a readable color-coded map. ARM64 encodes every instruction as a 32-bit integer, allowing for a total of 4,294,967,296 possible combinations. Yedidia used a Hilbert curve to organize these four billion possibilities into a two-dimensional plane. This specific type of space-filling curve preserves locality, meaning instructions with similar bit patterns remain near each other on the map. The resulting images reveal distinct clusters and patterns that define how ARM processors interpret code. The visualization relies on Arm’s Machine Readable Architecture (MRA) Specification. Yedidia used the June 2023 version of the spec, which includes all architectural extensions up to ARMv8.9. This document provides the XML and HTML data necessary to decode the semantics of every instruction in the Instruction Set Architecture (ISA).Parsing the official specification
Yedidia developed a custom tool to parse the massive XML files provided by Arm. This tool identified approximately 3,000 unique instruction encodings within the architecture. The parser extracts critical metadata for each entry, including mnemonics, instruction classes, and specific ARMv8 feature variants. A second tool then iterates through every possible 32-bit value to determine its function. The process is computationally intensive because it must evaluate billions of potential instructions against the encoding diagrams. The official specification uses bits represented as 0, 1, or x, but it also includes parenthesized (0) and (1) values. Yedidia treated these parenthesized bits as "don't care" values (x) to match how existing disassemblers handle the data. These bits likely represent recommended but not strictly required encodings. This mapping creates the foundation for the final image, where each pixel represents a block of the instruction space.Handling instruction logic errors
The official Arm specification often includes Arm Specification Language (ASL) code that can overrule simple bit-string encodings. For example, the EOR (Exclusive OR) instruction becomes undefined if certain bit conditions, like the "sf" and "N" flags, meet specific criteria. Simple bit-pattern matching cannot catch these edge cases. To solve this, Yedidia implemented a post-processing pass using the Capstone disassembler. Capstone understands the complex ASL rules and can identify which instructions are actually valid on real hardware. This step filters out "ghost" instructions that look correct in the bit-string but are rejected by the processor. The final visualization groups instructions into several distinct classes to make the map readable. These categories include:- General purpose instructions
- System and control operations
- Float and FPSIMD for floating-point math
- SVE and SVE2 for scalable vector extensions
- Mortlach and Mortlach2 (the internal names for SME and SME2)
- Other miscellaneous encodings
Visualizing software sandbox security
The mapping project also supports Yedidia’s research into Lightweight Fault Isolation (LFI). LFI is a software sandboxing technique designed to secure ARM64 systems by restricting what instructions a program can execute. Yedidia will present the formal paper on LFI at the ASPLOS conference this April. LFI uses machine code analysis to verify that an untrusted binary is safe to run. The verifier only permits instructions that obey strict invariants regarding memory access and register modification. If an instruction could potentially leak data or crash the system, the verifier flags the entire program as unsafe. Yedidia created a security heatmap using the Hilbert curve to show which parts of the ARM64 ISA are "legal" under LFI. In this view, red areas indicate blocks where every instruction is safe, while blue areas show regions heavily restricted by the sandbox. This visualization helps researchers verify that the sandboxing logic correctly identifies dangerous code patterns.Restricting the instruction space
The LFI verifier is significantly more restrictive than the standard ARM64 architecture. While the full ISA contains billions of possibilities, the current LFI verifier only permits roughly 750 million instructions. Most of these restrictions focus on protecting specific registers that the sandbox uses to maintain security boundaries. The verifier monitors and restricts instructions that modify the following registers:- x18 (often used as a platform register)
- x21 through x24 (used for sandbox invariants)
- sp (the stack pointer)
- x30 (the link register)
Building the interactive tool
Yedidia released an interactive web version of the map that allows users to explore the ARM64 space manually. The web tool uses a version of Capstone compiled to WebAssembly to provide real-time disassembly of any point on the Hilbert curve. Users can hover over pixels to see the exact assembly code and instruction class. One challenge in the web version is the assembler templates provided by Arm. These templates are designed for human readers rather than automated tools, making it difficult to generate string representations of instructions directly from the spec. The web tool currently falls back to displaying the instruction name if the WebAssembly version of Capstone does not recognize a specific extension. The source code for these visualization tools is available on GitHub under the "armvis" repository. Yedidia’s future plans include adding support for more ARM extensions and potentially creating a similar map for the RISC-V architecture. These tools provide a new way for security researchers and compiler engineers to understand the massive complexity of modern instruction sets.Related Articles
GoFigr explains why plot capture works in Python but not R
GoFigr captures plots in Python via Jupyter's single display system, making auto-publish reliable. In R, the graphics architecture lacks a central point, so auto-publish is experimental. Explicitly piping plots to publish() is recommended for R users as it's more reliable and idiomatic.
Notepad++ declares hardened update process 'effectively unexploitable'
Notepad++ v8.9.2 enhances security, making its update process "effectively unexploitable." This follows a state-sponsored hack, with new signature verification and auto-updater hardening.
Stay in the loop
Get the best AI-curated news delivered to your inbox. No spam, unsubscribe anytime.
