Why I'm Betting Against AI Agents in 2025 (Despite Building Them)

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 days ago

Why I'm Betting Against AI Agents in 2025 (Despite Building Them)

NobodyElse@sh.itjust.works · 3 days ago

Performing procedural tasks using a statistical model of our language will never be reliable. There’s a reason why we use logical and proscriptive syntax when we want deterministic outcomes.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 days ago

I expect what we will see are tools where the human manages high level implementation, and the agents are used to implement specific functionality that can be easily tested and verified. I can see something along the lines of a scene graph where you focus on the flow of the code, and farm off details of implementation of each step to a tool. As the article notes, these tools can already get over 90% degree accuracy in these scenarios.

NobodyElse@sh.itjust.works · 3 days ago

I agree that this could be helpful for finally getting to that natural language programming paradigm that people have been hoping for. But there’s going to have to be something capable of logically implementing the low level code and that has to be more than just a statistical model of how people write code, as trained on a big collection of random repositories. (I’m open to being proved wrong about that!)

The 90% accuracy could just arise from the fact that the tests are against trivial or commonly solved tasks, so that the exact solutions to them exist in the training set. Anything novel will exist outside the training set and outside of the model.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 days ago

I think it’s going to be humans that implement actually interesting code while LLMs handle common and tedious stuff. That’s the approach I’ve been using at work. When I need to crap out a UI based on some JSON payload, or make an HTTP endpoint, I let the LLM do it. When I have some actual business logic that’s domain specific, I write that myself. This allows me to focus on writing code that’s actually interesting, while the LLM does all the tedious work.

queermunist she/her@lemmy.ml · 2 days ago

But doesn’t the LLM sometimes churn out tedious garbage that you have to fix, thus not actually saving time?

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 days ago

That’s where the rate of success becomes important. LLMs mostly produce decent code when applied to common cases like the examples I gave above. My experience is that vast majority of the time it’s as good as what you’d write, occasionally needing minor tweaks. However, there’s nothing forcing you to use the code they produce either. If the LLM stumbles, you can always fall back to writing the code by hand which leaves you no worse off than you would’ve been otherwise. It’s all about learning how the tool works and when to use it.

queermunist she/her@lemmy.ml · 2 days ago

You have to check it every single time, though, erasing any time savings. You’re saving effort, maybe, but not time.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 days ago

You’re absolutely saving time, checking that the code works is far less time consuming than writing it. Especially for stuff like UIs or service endpoints. I literally work with this stuff on daily basis, and I would never go back. There’s also another aspect to it which is that I personally find it makes my workflow more enjoyable. It lets me focus on things I actually want to work on, while automating a lot of boilerplate that I had to write by hand previously. Even if it wasn’t saving me much time, there’s a quality of life improvement here.

queermunist she/her@lemmy.ml · 2 days ago

METR measured the speed of 16 developers working on complex software projects, both with and without AI assistance. After finishing their tasks, the developers estimated that access to AI had accelerated their work by 20% on average. In fact, the measurements showed that AI had slowed them down by about 20%.

Anissem@lemmy.ml · 3 days ago

Well said. I understood next to none of it, but well said.