The public is not learning from it. A person or corporation is creating a derivative work of it. Training a model is deriving a function from the training data. It is not "a human learning something by reading it".
It's not more a stretch than saying that re-encoding a PNG as a JPEG is a derivative work even though the process is lossy and the resulting bits look nothing alike.
You think that a model that's capable of being prodded into producing an infringing output in addition to all the other non-infringing outputs it could produce is no different than a compression algorithm?
If I "process data" by doing a word count of a book, and then I publish the number of words in that book (not the words themself! Just a word count!) I haven't created a derivative work.