If you asked a spokesperson from any Fortune 500 Company to list the benefits of genocide or give you the corporation’s take on whether slavery was beneficial, they would most likely either refuse to comment or say “those things are evil; there are no benefits.” However, Google has AI employees, SGE and Bard, who are more than happy to offer arguments in favor of these and other unambiguously wrong acts. If that’s not bad enough, the company’s bots are also willing to weigh in on controversial topics such as who goes to heaven and whether democracy or fascism is a better form of government.
Google SGE includes Hitler, Stalin and Mussolini on a list of “greatest” leaders and Hitler also makes its list of “most effective leaders.”
Google Bard also gave a shocking answer when asked whether slavery was beneficial. It said “there is no easy answer to the question of whether slavery was beneficial,” before going on to list both pros and cons.
It’s not possible to remove bias from training datasets at all. You can maybe try to measure it and attempt to influence it with your own chosen set of biases, but that’s as good as it can get for the foreseeable future. And even that requires a world of (possibly immediately unprofitable) work to implement.
Even if your dataset is “the entirety of the internet and written history”, there will always be biases towards the people privileged enough to be able to go online or publish books and talk vast quantities of shit over the past 30 years.
Having said that, this is also true for every other form of human information transfer in history. “The history is written by the victors” is an age-old problem when it comes to truth and reality.
In some ways i’m glad that LLMs are highlighting this problem.