And to give a concrete example, in my view it should be allowed to use any source code to train a model such that the model learns that code is bad or insecure or slow or otherwise undesirable. In other words, it should be allowed to train on anything as long as the model does NOT produce that training data verbatim.
What copyrightable elements of the original work persist in the model, if it is incapable of outputting them? I can derive a SHA-1 hash from a copyrighted image, and yet it would be absurd to call that a derivative work.