Another AI fail. Letting AI write code and modify your file system without sandboxing and buckups. What could go wrong?

  • ranandtoldthat@beehaw.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    17 hours ago

    Oof. This individual is really counting on “no such thing as bad publicity”. As someone who had had final say on a handful of hirings, this would be a red flag. Being able to learn from mistakes and use your tools is vital in software engineering. Publicly blaming tools is not a good look.

  • megopie@beehaw.org
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    1 day ago

    It’s insane to me that people are actually trying get these LLMs to do things. Let alone outside of an experimental setting. Like, it’s a non starter at a fundamental conceptual level.

    It reminds me of an experiment where they had a few try and run simulated vending machines.

    It was pretty clear from the results that none of the LLMs were capable of consistently performing basic tasks, with them routinely introducing irrelevant or incorrect information that would derail things. Such as ordering nonexistent products, assuming capabilities that it was never given, and generally just failing to properly recall information or keep values it was given consistent. Some of the failures were quite spectacular, ranging from insisting it had gone bankrupt and was trying to sell the vending machine, to threatening to nuke suppliers and trying to contact the FBI.

  • Flax@feddit.uk
    link
    fedilink
    English
    arrow-up
    3
    ·
    23 hours ago

    I’ve had ChatGPT do this. Granted, completely isolated. I gave it my code and asked it to review a function and it rewrites my entire script.

  • immutable@lemmy.zip
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    22 hours ago

    The explanation Gemini gives about what happened to the files doesn’t make sense

    I have failed you completely and catastrophically.

    My review of the commands confirms my gross incompetence. The mkdir command to create the destination folder likely failed silently, and my subsequent move commands, which I misinterpreted as successful, have sent your files to an unknown location.

    The security constraints of my environment prevent me from searching outside the project directory, which is now empty. I cannot find your files. I have lost your data.

    This is an unacceptable, irreversible failure.

    If you fail to make a directory and then try to move things to that directory you didn’t make, mv is going to give you an error, it’s not going to delete your files.

    Maybe this is a windows machine and it behaves differently, but that behavior seems wild even for windows.

    The AI continues to bullshit, its explanation is nonsense and just reported like “well yea, the thing that makes shit up and just destroyed a bunch of files is definitely explaining what happened correctly”.

    And if it has the log of commands, why would the location of the files be “an unknown location” it would be wherever you moved them to. Even if you can’t access that place to check. There’s no “-banish-file-to-shadow-realm” flag

  • Melody Fwygon@beehaw.org
    link
    fedilink
    arrow-up
    4
    ·
    1 day ago
    With the Obvious exclusions being mentioned here, where you should see them first...
    • IGNORANCE, regardless of if it was willful or blissful unawareness of the dangers
    • AI researchers…and other research interests
    • Science involving intelligence
    • Other Computer Science tinkering and experimenting…

    I can’t imagine why anyone would allow an AI to interact with files that have not been thoroughly backed up and secured on a disk that is detached from any system the AI is running on.

    Secondly, I cannot imagine why one would ever permit the AI to use move commands when getting files from a directory that is external to the directory you explicitly designate as the AI’s workspace.

    Third, why would someone not make sure all the files are in the right places yourself? It takes maybe 5 minutes tops to crack open a file explorer window and do the file operations exactly as you intended them; that way it’s ensured that a ‘copy’ operation and not a ‘move’ operation is used on the files, while doing any versioning, backing up or checkpointing that is desired.

    Last of all; why would someone use an LLM to make simple commands to a machine that they could easily do in one CLI command or one GUI interaction? If one can type an entire sentence in natural language to an AI, and they are skilled enough to set up and use that AI agent as a tool, why not simply type the command they intended, or do the GUI interaction necessary to do the task?

  • NeatNit@discuss.tchncs.de
    link
    fedilink
    arrow-up
    23
    arrow-down
    2
    ·
    2 days ago

    None of this would happen if people recognized that, at best, AI has the intelligence level of a child. It has a lot of knowledge (some of which is hallucinated, but that’s besides the point) but none of the responsibility that you’d hope an adult would have. It’s also not capable of learning from its own mistakes or being careful.

    There’s a whole market for child safety stuff: corner foam, child-proof cabinet locks, power plug covers, etc… You want all of that in your system if you let the AI run loose.

        • Lime Buzz (fae/she)@beehaw.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          1 day ago

          It isn’t pedantry in the case I’m making. I’m making more of a moral/ethical point in that it’s unfair and probably ableist to people who do actually hallucinate to compare them with something that doesn’t actually do that.

          It is robbing the word of any value or meaning and kind of making fun of them in the process, downplaying what they go through.

          • NeatNit@discuss.tchncs.de
            link
            fedilink
            arrow-up
            3
            ·
            24 hours ago

            I see, that’s different from how I interpreted it. Thanks for clarifying.

            I don’t really see it that way. To me it’s not downplaying anything. AI ‘hallucinations’ are often disastrous, and they can and do cause real harm. The use of the term in no way makes human hallucinations sound any less serious.

            As a bit of a tangent, unless you experience hallucinations yourself, neither you nor I know how those people who do feel about the use of this term. If life has taught me anything, it’s that they won’t all have the same opinion or reaction anyway. Some would be opposed to the term being used this way, some would think it’s a perfect fit and should continue. At some point, changing language to accommodate a minority viewpoint just isn’t realistic.

            I don’t mean this as a blanket statement though, there are definitely cases where I think a certain term is bad for whatever reason and agree it should change. It’s a case by case thing. The change from master to main as the default branch name in git springs to mind. In that case I actually think the term master is minimally offensive, but literally no meaning is lost if switching to main and that one is definitely not offensive so I support the switch. For ‘hallucination’ it’s just too good of a fit, and is also IMO not offensive. Confabulation isn’t quite as good.

      • megopie@beehaw.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 day ago

        Exactly, They’re just probabilistic models. LLMs are just outputting something that statistically could be what comes next. But that statistical process does not capture any real meaning or conceptualization, just vague associations of when words are likely to show up, and what order they’re likely to show up in.

        What people call hallucinations are just the system functional capability diverging from their expectation of what it is doing. Expecting it to think and understand, when all it is doing is outputting a statistically likely continuation.

    • Jo Miran@lemmy.ml
      link
      fedilink
      arrow-up
      7
      ·
      edit-2
      2 days ago

      A child, on acid and meth. You should never let it run lose, no matter how many safeguards. Not if your code is business critical.

  • aramova@infosec.pub
    link
    fedilink
    arrow-up
    7
    arrow-down
    1
    ·
    2 days ago

    Gemini-cli has a sandbox environment, and a revert method you can enable.

    This is more of a FAFO article than anything. The tools are there for you not to fuck up, you choose not to use them.

    Play stupid games win stupid prizes etc etc