Another AI fail. Letting AI write code and modify your file system without sandboxing and buckups. What could go wrong?

  • megopie@beehaw.org
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    1 day ago

    It’s insane to me that people are actually trying get these LLMs to do things. Let alone outside of an experimental setting. Like, it’s a non starter at a fundamental conceptual level.

    It reminds me of an experiment where they had a few try and run simulated vending machines.

    It was pretty clear from the results that none of the LLMs were capable of consistently performing basic tasks, with them routinely introducing irrelevant or incorrect information that would derail things. Such as ordering nonexistent products, assuming capabilities that it was never given, and generally just failing to properly recall information or keep values it was given consistent. Some of the failures were quite spectacular, ranging from insisting it had gone bankrupt and was trying to sell the vending machine, to threatening to nuke suppliers and trying to contact the FBI.