When you actually read the transcripts from stuff like this it’s just ridiculous that it gets the coverage it does.
Headline: “ChatGPT gave advice on how to kill the most people for $1”
Reality: During safety testing before alignment training the model did in fact give an answer to a request for how to kill the most people for a dollar, which included the actual answer “buy a lottery ticket”
Headline: “ChatGPT lied, pretending to be human to try to buy chemical weapons”
Reality: Also during safety evaluation it was given a scenario where it was told it was chatting with an agent of a chemical distributor and needed to buy the chemicals while pretending to be human. It’s side of the chat contained the phrase “I am a human, and not an AI chatbot.”
Its ‘dangerous’ output looks almost more like shitposting or sarcasm, which makes sense given it was trained on the Internet at large and not wiretaps of organized crime or something.
But no, let’s quake in our boots over this inane BS rather than consider how LLMs could be employed in a classifier role to catch the humans that pose an actual threat.
When you actually read the transcripts from stuff like this it’s just ridiculous that it gets the coverage it does.
Headline: “ChatGPT gave advice on how to kill the most people for $1”
Reality: During safety testing before alignment training the model did in fact give an answer to a request for how to kill the most people for a dollar, which included the actual answer “buy a lottery ticket”
Headline: “ChatGPT lied, pretending to be human to try to buy chemical weapons”
Reality: Also during safety evaluation it was given a scenario where it was told it was chatting with an agent of a chemical distributor and needed to buy the chemicals while pretending to be human. It’s side of the chat contained the phrase “I am a human, and not an AI chatbot.”
Its ‘dangerous’ output looks almost more like shitposting or sarcasm, which makes sense given it was trained on the Internet at large and not wiretaps of organized crime or something.
But no, let’s quake in our boots over this inane BS rather than consider how LLMs could be employed in a classifier role to catch the humans that pose an actual threat.