> Like a model trained on detecting lesions should perform equally well on ANY X...

> Like a model trained on detecting lesions should perform equally well on ANY X-Ray unless lesions show up differently in different demographics.

This is not true in practice.

For a model to perform well looking at ANY X-ray, it would need examples of every kind of X-ray.

That includes along race, gender, amputee status, etc.

The point of classification models is to discover differentiating features.

We don’t know those features before hand, so we give the model as much relevant information as we can and have it discover those features.

There very well may be differences between black woman X-rays and other X-rays, we don’t know for sure.

We can’t have that assumption when building a dataset.

Even believing that there are no possible differences between X-rays of different races is a bias that would be reflected by the dataset.