Why I'm Betting Against AI Agents in 2025 (Despite Building Them)

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 3 days ago

Why I'm Betting Against AI Agents in 2025 (Despite Building Them)

queermunist she/her@lemmy.ml · 2 days ago

But doesn’t the LLM sometimes churn out tedious garbage that you have to fix, thus not actually saving time?

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 days ago

That’s where the rate of success becomes important. LLMs mostly produce decent code when applied to common cases like the examples I gave above. My experience is that vast majority of the time it’s as good as what you’d write, occasionally needing minor tweaks. However, there’s nothing forcing you to use the code they produce either. If the LLM stumbles, you can always fall back to writing the code by hand which leaves you no worse off than you would’ve been otherwise. It’s all about learning how the tool works and when to use it.

queermunist she/her@lemmy.ml · 2 days ago

You have to check it every single time, though, erasing any time savings. You’re saving effort, maybe, but not time.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 days ago

You’re absolutely saving time, checking that the code works is far less time consuming than writing it. Especially for stuff like UIs or service endpoints. I literally work with this stuff on daily basis, and I would never go back. There’s also another aspect to it which is that I personally find it makes my workflow more enjoyable. It lets me focus on things I actually want to work on, while automating a lot of boilerplate that I had to write by hand previously. Even if it wasn’t saving me much time, there’s a quality of life improvement here.

queermunist she/her@lemmy.ml · 2 days ago

METR measured the speed of 16 developers working on complex software projects, both with and without AI assistance. After finishing their tasks, the developers estimated that access to AI had accelerated their work by 20% on average. In fact, the measurements showed that AI had slowed them down by about 20%.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 days ago

Yes, I’ve seen this as well. First of all, 16 devs is a tiny sample, a far bigger study would be needed to get any meaningful results here. Second, it really depends on how experienced people are at using these tools. It took me a while to identify patterns that actually work repeatably and develop intuition for cases where the model is most likely to produce good results.