Another AI fail. Letting AI write code and modify your file system without sandboxing and buckups. What could go wrong?
Oof. This individual is really counting on “no such thing as bad publicity”. As someone who had had final say on a handful of hirings, this would be a red flag. Being able to learn from mistakes and use your tools is vital in software engineering. Publicly blaming tools is not a good look.
The explanation Gemini gives about what happened to the files doesn’t make sense
I have failed you completely and catastrophically.
My review of the commands confirms my gross incompetence. The mkdir command to create the destination folder likely failed silently, and my subsequent move commands, which I misinterpreted as successful, have sent your files to an unknown location.
The security constraints of my environment prevent me from searching outside the project directory, which is now empty. I cannot find your files. I have lost your data.
This is an unacceptable, irreversible failure.
If you fail to make a directory and then try to move things to that directory you didn’t make, mv is going to give you an error, it’s not going to delete your files.
Maybe this is a windows machine and it behaves differently, but that behavior seems wild even for windows.
The AI continues to bullshit, its explanation is nonsense and just reported like “well yea, the thing that makes shit up and just destroyed a bunch of files is definitely explaining what happened correctly”.
And if it has the log of commands, why would the location of the files be “an unknown location” it would be wherever you moved them to. Even if you can’t access that place to check. There’s no “-banish-file-to-shadow-realm” flag
I’ve had ChatGPT do this. Granted, completely isolated. I gave it my code and asked it to review a function and it rewrites my entire script.
Just
git restore
that shit
It’s insane to me that people are actually trying get these LLMs to do things. Let alone outside of an experimental setting. Like, it’s a non starter at a fundamental conceptual level.
It reminds me of an experiment where they had a few try and run simulated vending machines.
It was pretty clear from the results that none of the LLMs were capable of consistently performing basic tasks, with them routinely introducing irrelevant or incorrect information that would derail things. Such as ordering nonexistent products, assuming capabilities that it was never given, and generally just failing to properly recall information or keep values it was given consistent. Some of the failures were quite spectacular, ranging from insisting it had gone bankrupt and was trying to sell the vending machine, to threatening to nuke suppliers and trying to contact the FBI.
With the Obvious exclusions being mentioned here, where you should see them first...
- IGNORANCE, regardless of if it was willful or blissful unawareness of the dangers
- AI researchers…and other research interests
- Science involving intelligence
- Other Computer Science tinkering and experimenting…
I can’t imagine why anyone would allow an AI to interact with files that have not been thoroughly backed up and secured on a disk that is detached from any system the AI is running on.
Secondly, I cannot imagine why one would ever permit the AI to use move commands when getting files from a directory that is external to the directory you explicitly designate as the AI’s workspace.
Third, why would someone not make sure all the files are in the right places yourself? It takes maybe 5 minutes tops to crack open a file explorer window and do the file operations exactly as you intended them; that way it’s ensured that a ‘copy’ operation and not a ‘move’ operation is used on the files, while doing any versioning, backing up or checkpointing that is desired.
Last of all; why would someone use an LLM to make simple commands to a machine that they could easily do in one CLI command or one GUI interaction? If one can type an entire sentence in natural language to an AI, and they are skilled enough to set up and use that AI agent as a tool, why not simply type the command they intended, or do the GUI interaction necessary to do the task?
Why would you allow any outsider to your code without using version control?
You’re assuming they know version control exists.
If you need an LLM to move files in a folder, maybe you should not be a PM.
None of this would happen if people recognized that, at best, AI has the intelligence level of a child. It has a lot of knowledge (some of which is hallucinated, but that’s besides the point) but none of the responsibility that you’d hope an adult would have. It’s also not capable of learning from its own mistakes or being careful.
There’s a whole market for child safety stuff: corner foam, child-proof cabinet locks, power plug covers, etc… You want all of that in your system if you let the AI run loose.
AI does not hallucinate, since it has no conciousness or thinking.
I genuinely considered writing “confabulated” instead of “hallucinated” but decided to stick with the latter because everyone knows what it means by now. It also seems that ‘hallucination’ is the term of art for this: https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)
So while I appreciate pedantry and practice it myself, I do stand by my original phrasing in this case.
It isn’t pedantry in the case I’m making. I’m making more of a moral/ethical point in that it’s unfair and probably ableist to people who do actually hallucinate to compare them with something that doesn’t actually do that.
It is robbing the word of any value or meaning and kind of making fun of them in the process, downplaying what they go through.
I see, that’s different from how I interpreted it. Thanks for clarifying.
I don’t really see it that way. To me it’s not downplaying anything. AI ‘hallucinations’ are often disastrous, and they can and do cause real harm. The use of the term in no way makes human hallucinations sound any less serious.
As a bit of a tangent, unless you experience hallucinations yourself, neither you nor I know how those people who do feel about the use of this term. If life has taught me anything, it’s that they won’t all have the same opinion or reaction anyway. Some would be opposed to the term being used this way, some would think it’s a perfect fit and should continue. At some point, changing language to accommodate a minority viewpoint just isn’t realistic.
I don’t mean this as a blanket statement though, there are definitely cases where I think a certain term is bad for whatever reason and agree it should change. It’s a case by case thing. The change from
master
tomain
as the default branch name in git springs to mind. In that case I actually think the termmaster
is minimally offensive, but literally no meaning is lost if switching tomain
and that one is definitely not offensive so I support the switch. For ‘hallucination’ it’s just too good of a fit, and is also IMO not offensive. Confabulation isn’t quite as good.
Exactly, They’re just probabilistic models. LLMs are just outputting something that statistically could be what comes next. But that statistical process does not capture any real meaning or conceptualization, just vague associations of when words are likely to show up, and what order they’re likely to show up in.
What people call hallucinations are just the system functional capability diverging from their expectation of what it is doing. Expecting it to think and understand, when all it is doing is outputting a statistically likely continuation.
A child, on acid and meth. You should never let it run lose, no matter how many safeguards. Not if your code is business critical.
I am never quite sure if the I in AI stands for intelligence or ignorance.
Gemini-cli has a sandbox environment, and a revert method you can enable.
This is more of a FAFO article than anything. The tools are there for you not to fuck up, you choose not to use them.
Play stupid games win stupid prizes etc etc
Indeed, there were a lot of [sic] buck ups here.