Judge a Book by Its Cover

JamesFebruary 15, 20263 min read

building-in-publicbook-datacover-art

Last time, I ended with a throwaway promise about cover art being "a multi-week saga involving API rate limits, placeholder detection, and the discovery that Google Books will sometimes return a photo of the table of contents instead of a cover."

All of that was true. I just didn't know yet how much worse it would get.

InkTree has about 5,500 works in its database. Covers are the first thing a reader sees. A library without them looks broken — like a bookshelf full of blank spines. Nobody trusts that.

Those apps you're used to with the gorgeous cover images? They have publisher deals and kickthrough agreements. You show the cover, the user taps through, the publisher gets a sale. Everybody wins — except the platform is now a storefront.

I'm trying to stay agnostic. InkTree isn't selling you books. It's helping you process the ones you've already read. So: how do you get beautiful, high-res cover art without publisher deals and without pushing merchandise?

Two free sources: OpenLibrary and Google Books. Both are... adventures.

OpenLibrary's cover API is genuinely great. Free, unlimited, work-centric — exactly how I think about books. For each work, I scanned all known editions for a cover, filtered by English language to avoid getting the Japanese edition of The Great Gatsby.

3,271 covers enriched in 2 hours and 7 minutes. Decent hit rate for popular books.

But OL's CDN is inconsistent. Some URLs return broken images. Some covers are foreign-language editions that slipped through the filter. And for anything remotely obscure, you get nothing. I went from 0% to about 60% coverage. Good start. Not good enough.

Google Books has a standard image parameter called zoom. Set zoom=1, you get a thumbnail — 10KB, useless. Set zoom=0, you get a larger image. In theory.

In practice, zoom=0 sometimes returns interior pages. A table of contents. A title page. Page 3 of the actual book. You ask for a cover and Google hands you Chapter One.

This is wild behavior for a cover image API. I genuinely don't know how this happens at Google's scale. But it does, and it means you can't trust zoom=0.

So rewrite, re-scrub. Ensure we're handling image resolution on size and post-processing, NOT on the defined API variables. Got it. That's fun.

Google Books doesn't return a 404 when it doesn't have a cover. It returns a placeholder image — a grey "image not available" graphic. And it's always the same: exactly 9,103 bytes, 575x750 pixels, aspect ratio 0.767.

Every cover image that comes back under 15KB gets rejected. The placeholder is 9KB, so it never makes it through. But the first time I ran the pipeline without this check, I loaded hundreds of identical grey rectangles into my database and had to clean them all out. Placeholder detection is unglamorous work, but it's the difference between a library that looks alive and one that looks haunted.

Here's where it gets painful. The Google Books Search API — the part that finds the book so you can grab the cover — rate-limits you after about 800 requests. At 4 seconds per request, that's 53 minutes of continuous work before it silently stops returning results.

Not an error. Not a 429 Too Many Requests. Just... empty responses. searchForVolumeId() returns null. You think you've exhausted your coverless books. You haven't. Google just stopped talking to you.

I ran it in waves. Phase 3a: 424 covers before the rate limit at work #817. Phase 3b: 68 more. Phase 3c: zero — still locked out from the previous run.

After all three sources and multiple phases:

3,898 works have covers (70.5%) — up from about 60% before enrichment
1,634 works remain coverless — but 92.7% of those have zero tags, zero genres, and zero user engagement
Only 2 coverless works have real users who care: Walden and Wayfinder

The pipeline is done. The remaining coverless works are foreign-language editions, academic papers, and books nobody in the system has ever touched. Chasing them would mean burning more rate-limited API calls for books that don't matter to anyone's library.

The final architecture is a 4-tier fallback chain. For each coverless work, try in order: OL multi-edition scan, Google Books by ISBN, Google Books by title+author search, OL by ISBN on a different CDN path. First one that returns a real image above 15KB wins.

It's not elegant. It's a waterfall of API calls with rate-limit awareness and placeholder detection. But it works, and 70% of 5,500 works have real, high-quality cover images without a single publisher deal.

Sometimes the unglamorous infrastructure is the whole product.

That's two posts deep into the data layer now — editions vs works, and now covers. Next time I want to get into something more interesting: what happens when you stop searching for books by genre and start searching by the dimensions that actually mattered to you. Turns out, similarity is more subjective than any algorithm wants to admit.

I'm building InkTree — a reading companion that replaces star ratings with five dimensions. It's in beta, and I'm writing about the weird stuff I find along the way.