In the news

You are reading a single comment by @NurseHolliday and its replies. Click here to read the full conversation.

•

itsbruce in reply to @salmonchild

People choose the training data sets and the rules that judge the responses. Plenty room for unintended bias

•

NurseHolliday in reply to @itsbruce

The bias isn't in the people who choose the training data, it's in the choices made that make the current status quo and thus decide the training data.

Eg, if I pick training data of all of our staff with 3.5/5 or above, and up until now HR have been biased in picking old, white men, then that will come through in the training data.

How you avoid that bias is removing ethnicity, age, and sex as parameters in the training data, but that should be fucking obvious to any data scientist with even the slightest hint of commercial awareness.

•

SwissChap in reply to @NurseHolliday

How you avoid that bias is removing ethnicity, age, and sex as parameters in the training data

That would definitely be step 1, but it's worth mentioning that it's no guarantee you've solved the problem. There's still ample opportunity for hidden biases to sneak in.
•

frankenbike in reply to @NurseHolliday

How you avoid that bias is removing ethnicity, age, and sex as parameters in the training data, but that should be fucking obvious to any data scientist with even the slightest hint of commercial awareness.

But with neural nets and so on you can't necessarily do this easily because you don't directly control what the computer factors into its decision. Removing age as a parameter is okay but the computer might still be biased to, say, long CVs, because all your employees are old and have had a lot of different jobs. Or, if all of your employees went to Eton, it might pick up on that word in the application text and weight Etonian applicants higher.

Or it might pick up on language differences between male/female/white/BAME applicants, etc.
•

itsbruce in reply to @NurseHolliday

The bias isn't in the people who choose the training data,

It isn't always and it isn't only, but it is often a factor

About

Avatar for NurseHolliday @NurseHolliday started