The bias isn't in the people who choose the training data, it's in the choices made that make the current status quo and thus decide the training data.
Eg, if I pick training data of all of our staff with 3.5/5 or above, and up until now HR have been biased in picking old, white men, then that will come through in the training data.
How you avoid that bias is removing ethnicity, age, and sex as parameters in the training data, but that should be fucking obvious to any data scientist with even the slightest hint of commercial awareness.
The bias isn't in the people who choose the training data, it's in the choices made that make the current status quo and thus decide the training data.
Eg, if I pick training data of all of our staff with 3.5/5 or above, and up until now HR have been biased in picking old, white men, then that will come through in the training data.
How you avoid that bias is removing ethnicity, age, and sex as parameters in the training data, but that should be fucking obvious to any data scientist with even the slightest hint of commercial awareness.