The Perils of ISBN
Summary
Existing book apps are clunky. Building a "Letterboxd for books" is tough as book APIs return messy, duplicated editions instead of clean, unified "work" records, unlike movie databases.
Amazon keeps Goodreads in stasis
Developers attempting to build a modern book-tracking alternative face a massive metadata hurdle that keeps readers locked into Amazon’s aging Goodreads ecosystem. While movie fans have migrated to Letterboxd for its clean interface and social features, book lovers remain stuck with platforms that prioritize ad-heavy newsletters and "reading challenges" over basic utility.
Goodreads has seen little innovation since Amazon acquired the platform in 2013. The interface requires a half-dozen clicks to log a single book, and the search bar often forces users through three different menus to record a finished read. The "My Books" tab further complicates the experience by mixing current reads with future wishlists by default.
Independent alternatives like StoryGraph attempt to fill this void but often lean on features that miss the mark for minimalist users. These platforms frequently focus on the following metrics rather than simple logging:
- AI-powered reading analytics and mood-based suggestions
- User polls regarding character-driven versus plot-driven narratives
- Computer-generated recommendations that lack the human touch of social circles
- Complex data visualizations that clutter the mobile experience
The result is a fragmented market where users often abandon dedicated trackers entirely. Many readers now rely on manual Obsidian files or simple spreadsheets to avoid the friction of existing book-focused social networks.
The fragmented world of ISBNs
Building a custom book tracker reveals a fundamental flaw in how digital book data is organized. The Google Books API offers a free entry point for developers, but a simple query for a classic title like The Last Unicorn returns a chaotic list of redundant entries. This happens because the publishing industry assigns a unique ISBN to every single format and edition of a work.
A single novel might have dozens of unique identifiers for various formats, including:
- The original hardcover release
- Mass-market paperback editions
- Digital eBook files for Kindle or Kobo
- Audiobook narrations with different voice actors
- Special anniversary editions with new forewords or cover art
When a developer pulls data from the Google Books API, the system treats each of these formats as a separate entity. A user searching for a book doesn't want to choose between ten different ISBNs just to record that they finished a story. They want to log the "work" itself, but the current API infrastructure is built to track "manifestations" of that work for retail purposes.
This technical debt makes the Google Books API nearly unusable for a Letterboxd-style experience. Without a way to collapse these duplicates into a single entry, the search results remain a mess of redundant titles and conflicting metadata.
Librarians solve the versioning problem
Professional librarians have already solved this organizational nightmare using the FRBR model, which stands for Functional Requirements for Bibliographic Records. This framework establishes a hierarchy that distinguishes between a creative idea and the physical object you hold in your hand. Most consumer-facing book apps fail because they ignore this hierarchy in favor of flat retail data.
The FRBR model breaks a book down into four distinct levels:
- Work: The abstract creative idea (e.g., the story of The Last Unicorn).
- Expression: The specific realization of that work (e.g., the original English text or a Spanish translation).
- Manifestation: The physical or digital format (e.g., a 1968 hardcover or a 2023 Kindle file).
- Item: The specific copy owned by an individual or a library.
A successful book tracker needs to operate at the Work level. Users want to see one entry for a novel where all reviews, ratings, and social discussions live, regardless of whether they read a paperback or listened to an Audible file. Current APIs struggle to group these levels together, often resulting in "Hotel Iris" by Yoko Ogawa appearing as four separate works in the OpenLibrary database.
Mapping these relationships requires significant manual curation or highly sophisticated algorithms. Because book data is often crowd-sourced or pulled from messy library catalogs, the "clean" data needed for a slick UI rarely exists in an open-source format.
Scaling metadata for 40 million works
The scale of the book industry dwarfs the film industry, making the data cleanup task significantly harder for small development teams. Letterboxd relies on The Movie Database (TMDB) as its primary source of truth. TMDB currently tracks roughly 1 million movies, a manageable number for a dedicated community to curate and maintain.
In contrast, OpenLibrary currently lists more than 40 million works in its catalog. The sheer volume of data makes it difficult to maintain a high-quality, open-source database without massive financial backing. The problem is at least an order of magnitude larger than the one TMDB solved for cinema.
This discrepancy in scale leads to several technical challenges for independent developers:
- Data Integrity: Crowdsourced book data often contains typos, missing authors, and duplicate entries.
- Server Costs: Indexing and searching 40 million records requires more infrastructure than a smaller film database.
- Lack of Funding: While film databases benefit from the commercial interests of Hollywood, book databases have less institutional support.
Letterboxd effectively commercialized the "commons" of film metadata by building a premium UI on top of free data. For a "Letterboxd for books" to exist, someone must first build a TMDB for books that can handle the 40-million-work scale while correctly implementing the FRBR model.
The lack of a movie database equivalent
The core issue is that no high-quality, open-source analogue to TMDB exists for the literary world. Amazon owns the most comprehensive book databases, including Goodreads and the Kindle metadata store, and it has no incentive to share that data with potential competitors. This creates a "chicken-and-egg" problem where new apps can't get clean data, and clean data isn't being generated because there are no popular new apps to drive contributions.
OpenLibrary is the closest attempt at a solution, but its data remains "noisy" and requires significant munging to be useful in a production environment. Developers spend more time cleaning JSON outputs from the Google Books API or OpenLibrary than they do building features for their users. This technical barrier protects Amazon's monopoly on the book-tracking market.
Until a dedicated project manages to organize the world's 40 million books into a Work-based hierarchy, readers will likely remain stuck with the cluttered, ad-filled interfaces of the past decade. The technology to build a better tracker exists, but the foundation of clean data is still missing.
Related Articles

Google's emissions surge 50% in five years due to AI data centers
Michael Pollan's new book examines consciousness, which he says is under threat, on personal and technological levels.
How to ground AI agents in accurate, context-rich data
AI agents need organized, context-rich data to work effectively in enterprises. Specialized search tools like Elastic's platform help manage and prioritize vast data streams, ensuring accuracy and preventing compounding errors in business tasks.
Stay in the loop
Get the best AI-curated news delivered to your inbox. No spam, unsubscribe anytime.
