More

acabal · 2026-04-20T20:10:34 1776715834

I've always told people, Kindles are ereaders seeming designed by people who hate books.

The renderer is atrocious and is holding back the entire industry, much like IE6's crappy renderer and monopoly on users held the entire web back a decade. Browsers (and thus ebooks, which are just HTML/CSS) can now do pretty decent typography, but Amazon inexplicably refuses to get on board with epub.

Their file formats are equally garbage. Mobi, a format that has hardly changed since circa the year 2005, was still in active use until just recently. Their other proprietary formats are confusing in feature set and are opaque to create. The official tool to create Amazon ebooks only runs on Windows![1]

Kindles still can't natively read epubs, but since they accept epubs via email, their customers get confused and email me about it. (Epubs sent via email are quietly convert to Amazon's propriety format, meaning all bets are off on the result. Good luck, publisher!)

I always tell people, buy literally any other ereader.

[1] Calibre can also create them but it's reverse-engineering and not the official implementation.

acabal · 2026-03-21T21:50:29 1774129829

The reading ease algorithm we use is the Flesh-Kincaid algorithm, which works pretty well for regular prose books but clearly fails very badly on avant-garde prose like Ulysses or As I Lay Dying.

acabal · 2026-01-23T05:31:49 1769146309

XML lost because 1) the existence of attributes means a document cannot be automatically mapped to a basic language data structure like an array of strings, and 2) namespaces are an unmitigated hell to work with. Even just declaring a default namespace and doing nothing else immediately makes your day 10x harder.

These items make XML deeply tedious and annoying to ingest and manipulate. Plus, some major XML libraries, like lxml in Python, are extremely unintuitive in their implementation of DOM structures and manipulation. If ingesting and manipulating your markup language feels like an endless trudge through a fiery wasteland then don't be surprised when a simpler, more ergonomic alternative wins, even if its feature set is strictly inferior. And that's exactly what happened.

I say this having spent the last 10 years struggling with lxml specifically, and my entire 25 year career dealing with XML in some shape or form. I still routinely throw up my hands in frustration when having to use Python tooling to do what feels like what should be even the most basic XML task.

Though xpath is nice.

masklinn · 2026-01-23T07:59:48 1769155188

> Plus, some major XML libraries, like lxml in Python, are extremely unintuitive in their implementation of DOM structures and manipulation.

Lxml, or more specifically its inspiration ElementTree is specifically not a (W3C) DOM or dom-style API. It was designed for what it called “data-style” XML documents where elements would hold either text or sub-elements but not both, which is why mixed-content interactions are a chore (lxml augments the API by adding more traversal axis but elementtree does not even have that, it’s a literal tree of elements). effbot.org used to have a page explaining its simplified infoset before Fredrik passed and registration lapsed, it can be accessed through archive.org.

That means lxml is, by design, not the right tool to interact with mixed-content documents. But of course the issue is there isn’t really a right tool for that, as to my knowledge nobody has bothered building a fast DOM-style library for Python.

If you approach lxml as what ElementTree was designed as it’s very intuitive: an element is a sequence of sub-elements, with a mapping of attributes. It’s a very straightforward model and works great for data documents, as well as fits great within the langage. But of course that breaks down for mixed content documents as your text nodes get relegated to `tail` attributes (and ElementTree straight up discards comments and PIs, though lxml reverted that).

matkoniecz · 2026-01-23T06:17:21 1769149041

> even if its feature set is strictly inferior

and often having less bizarre and overly complex features is a feature by itself

small_scombrus · 2026-01-23T07:51:15 1769154675

Base JSON not supporting comments is a sometimes annoying 'feature' because without it no-one can use the comments to try and add extra functionality into their JSON file using comment tags so you don't end up with a million JSON+ custom formats.

acabal · 2026-01-07T19:53:21 1767815601

Taking their sponsors page at face value and doing the math, they're bringing in close to $100k/month with corporate sponsorships alone... how much money could maintaining a framework possibly cost?

everfrustrated · 2026-01-07T20:26:01 1767817561

They had 8 employees

acabal · 2026-01-07T20:54:17 1767819257

Sure, but to maintain a CSS framework? Seems like they way overhired.

hu3 · 2026-01-08T04:42:54 1767847374

They have some rust tooling, no?

f311a · 2026-01-07T21:10:06 1767820206

With TC of $250k. There is a lot of room for optimization.

dbbk · 2026-01-07T21:24:56 1767821096

They shouldn’t

acabal · 2026-01-02T22:32:31 1767393151

No, none have reached out yet. I've had some brief, high-level discussion along those lines with some people in the library industry, and the conclusion I drew is that public libraries in the US are highly fragmented in terms of technological capability. Instead of partnering with individual local library systems, it would make the most sense to - as you mentioned - partner with Overdrive. But there's been no movement in that direction. If anyone from Overdrive is reading, get in touch :)

acabal · 2026-01-02T18:00:15 1767376815

I know you griped about this in a different thread, but we won't be doing that, sorry. You can uniquely identify an ebook and its version by using dc:identifier in combination with dcterms:modified in the metadata file. If you desperately need a filesystem-safe string then concatenate those two and sha it.

acabal · 2026-01-02T15:55:34 1767369334

As Robin mentioned the typical style is "fine art oil painting", with some wiggle room allowed for exceptionally difficult cases (like Asian-themed books, as there just wasn't much fine art on that subject pre-1930).

We also require that the art have some kind of connection to the book itself, so it's not just some random fine art. Sometimes the connection is a little fuzzy, but we do the best we can given that art must be pre-1930 and also must have been previously published.

(My personal favorite artwork selection of the books I worked on is The Communist Manifesto[1]. That painting was actually made specifically for a different book by Willa Cather[2], but I thought the peasant laborer, holding a sickle in one hand, with a faraway look in her eyes as the red sun rises behind her was just too good to pass up for Marx!)

1920ish was when it started becoming much more common for books to have illustrated dust jackets, so now that more books from that era and onwards are entering the public domain, we opt to use the first edition dust jacket if it's in the appropriate style. Fortunately for us, that era also happens to be the so-called Golden Age of Illustration so it's not hard finding beautiful art to use!

[1] https://standardebooks.org/ebooks/karl-marx_friedrich-engels...

[2] https://standardebooks.org/ebooks/willa-cather/the-song-of-t...

acabal · 2026-01-02T15:52:33 1767369153

We have a list of wanted ebooks here: https://standardebooks.org/contribute/wanted-ebooks

First-time contributors should select something from the appropriate section, because that gives you the greatest chance of succeeding and the least burden on our reviewers as you get started.

Our toolset has a help wanted section and some outstanding issues: https://github.com/standardebooks/tools#help-wanted

acabal · 2026-01-02T15:50:16 1767369016

The ebooks we produce are entirely in the US public domain, including metadata and any other files. Unfortunately there are basically no good fonts released under the CC0 license. (Most open fonts are released under the OFL license, which is not the same.) Therefore we don't embed any font files, except for Standard Blackletter[1] when necessary, which is a font we developed especially for our use based on public domain specimens, and released via the CC0 license.

[1] https://github.com/standardebooks/standard-blackletter

acabal · 2026-01-01T17:01:58 1767286918

SE Editor-in-Chief here! As always, happy to answer any questions.