- cross-posted to:
- programming@programming.dev
- cross-posted to:
- programming@programming.dev
“No Duh,” say senior developers everywhere.
The article explains that vibe code often is close, but not quite, functional, requiring developers to go in and find where the problems are - resulting in a net slowdown of development rather than productivity gains.
It turns every prototyping exercise into a debugging exercise. Even talented coders often suck ass at debugging.
AI coding is the stupidest thing I’ve seen since someone decided it was a good idea to measure the code by the amount of lines written.
More code is better, obviously! Why else would a website to see a restaurant menu be 80Mb? It’s all that good, excellent code.
It did solve my impostor syndrome though. Turns out a bunch of people I saw to be my betters were faking it all along.
Oh wow. No shit. Anyway!
I would say absolutely in the general sense nost people, and the salesmen, frame them in.
When I was invited to assist with the GDC development, I got a chance to partner with a few AI developers and see the development process firsthand, try my hand at it myself, and get my hands on a few low parameter models for my own personal use. It’s really interesting just how capable some models are in their specific use-cases. However, even high param. models easily become useless at the drop of a hat.
I found the best case, one that’s rarely done mind you, is integrate the model into a program that has the ability to call a known database. With a properly trained model to format output in both natural language and use a given database for context calls, and concrete information, the qualitative performance leaps ahead by bounds. Problem is, that requires so much customization it pretty much ends up being something a capable hobbyist would do, it’s just not economically sound for a business to adopt.
I have been vibe coding a whole game in JavaScript to try it out. So far I have gotten a pretty ok game out of it. It’s just a simple match three bubble pop type of thing so nothing crazy but I made a design and I am trying to implement it using mostly vibe coding.
That being said the code is awful. So many bad choices and spaghetti code. It also took longer than if I had written it myself.
So now I have a game that’s kind of hard to modify haha. I may try to setup some unit tests and have it refactor using those.
Wait, are you blaming AI for this, or yourself?
Blaming? I mean it wrote pretty much all of the code. I definitely wouldn’t tell people I wrote it that way haha.
Sounds like vibecoders will have to relearn the lessons of the past 40 years of software engineering.
As with every profession every generation… only this time on their own because every company forgot what employee training is and expects everyone to be born with 5 years of experience.
Imagine if we did “vibe city infrastructure”. Just throw up a fucking suspension bridge and we’ll hire some temps to come in later to find the bad welds and missing cables.
shocked_pikachu_face.jpg
I’d much rather write my own bugs to have to waste hours fixing, thanks.
LLMs work great to ask about tons of documentation and learn more about high-level concepts. It’s a good search engine.
The code they produce have basically always disappointed me.
I sometimes get up to five lines of viable code. Then upon occasion what should have been a one liner tries to vomit all over my codebase. The best feature about AI enabled IDE is the button to decline the mess that was just inflicted.
In the past week I had two cases I thought would be “vibe coding” fodder, blazingly obvious just tedious. One time it just totally screwed up and I had to scrap it all. The other one generated about 4 functions in one go and was salvageable, though still off in weird ways. One of those was functional, just nonsensical. It had a function to check whether a certain condition was present or not, but instead of returning a boolean, it passed a pointer to a string and set the string to “” to indicate false… So damn bizarre, hard to follow and needlessly more lines of code, which is another theme of LLM generated code.
On proprietary products, they are awful. So many hallucinations that waste hours. A manager used one on a code review of mine and only admitted it after I spent the afternoon chasing it.
Those happen so often. I’ve taken to stop calling them hallucinations anymore (that’s anthropomorphising and over-selling what LLMs do imho). They are statistical prediction machines, and either they hit their practical limits of predicting useful output, or we just call it broken.
I think the next 10 years are going to be all about learning what LLMs are actually good for, and what they are fundamentally limited at no matter how much GPU ram we throw at it.
Hallucinationsbullshit
Not even proprietary, just niche things. In other words anything that’s rarely used in open source code, because there’s nothing to train the models on.
These types of articles always fail to mention how well trained the developers were on techniques and tools. In my experience that makes a big difference.
My employer mandates we use AI and provides us with any model, IDE, service we ask for. But where it falls short is providing training or direction on ways to use it. Most developers seem to go for results prompting and get a terrible experience.
I on the other hand provide a lot of context through documents and various mcp tooling, I talk about the existing patterns in the codebase and provide sources to other repositories as examples, then we come up with an implementation plan and execute on it with a task log to stay on track. I spend very little time fixing bad code because I spent the setup time nailing down context.
So if a developer is just prompting “Do XYZ”. It’s no wonder they’re spending more time untangling a random mess.
Another aspect is that everyone seems to always be working under the gun and they just don’t have the time to figure out all the best practices and techniques on their own.
I think this should be considered when we hear things like this.
I have 3 questions, and I’m coming from a heavily AI-skeptic position, but am open:
-
Do you believe that providing all that context, describing the existing patterns, creating an implementation plan, etc, allows the AI to both write better code and faster than if you just did it yourself? To me, this just seems like you have to re-write your technical documentation in prose each time you want to do something. You are saying this is better than ‘Do XYZ’, but how much twiddling of your existing codebase do you need to do before an AI can understand the business context of it? I don’t currently do development on an existing codebase, but every time I try to get these tools to do something fairly simple from scratch, they just flail. Maybe I’m just not spending the hours to build my AI-parsable functional spec. Every time I’ve tried this, asking something as simple as (and paraphrased for brevity) “write an Asteroids clone using JavaScript and HTML 5 Canvas” results in a full failure, even with multiple retries chasing errors. I wrote something like that a few years ago to learn Javascript and it took me a day-ish to get something that mostly worked.
-
Speaking of that context. Are you running your models locally, or do you have some cloud service? If you give your entire codebase to a 3rd party as context, how much of your company’s secret sauce have you disclosed? I’d imagine most sane companies are doing something to make their models local, but we see regular news articles about how ChatGPT is training on user input and leaking sensitive data if you ask it nicely and I can’t imagine all the pro-AI CEOs are aware of the risks here.
-
How much pen-testing time are you spending on this code, error handling, edge cases, race conditions, data sanitation? An experienced dev understands these things innately, having fixed these kinds of issues in the past and knows the anti-patterns and how to avoid them. In all seriousness, I think this is going to be the thing that actually kills AI vibe coding, but it won’t be fast enough. There will be tons of new exploits in what used to be solidly safe places. Your new web front-end? It has a really simple SQL injection attack. Your phone app? You can tell it your username is admin’joe@google.com and it’ll let you order stuff for free since you’re an admin.
I see a place for AI-generated code, for instant functions that do something blending simple and complex. “Hey claude, write a function to take a string and split it at the end of every sentence containing an uppercase A”. I had to write weird functions like that constantly as a sysadmin, and transforming data seems like a thing an AI could help me accelerate. I just don’t see that working on a larger scale, though, or trusting an AI enough to allow it to integrate a new function like that into an existing codebase.
Thank you for reading my comment. I’m on the train headed to work and I’ll try to answer completely. I love talking about this stuff.
Do you believe that providing all that context, describing the existing patterns, creating an implementation plan, etc, allows the AI to both write better code and faster than if you just did it yourself?
For my work, absolutely. My work is a lot of tickets that were setup from multiple stories and multiple epics. It would be like asking me if I am really framing a house faster with a nail gun and compressor. If I were just hanging up a picture or a few pictures in the hallway, it’s probably faster to use a hammer than to set up the compressor and nail gun, plus cleanup.
However, a lot of that documentation already exists by the time it gets to me. All of the Design Documents and Product Requirement Documents have already been formed, discussed, and approved by our architecture team and team leads. Imagine if you already had this documentation for the asteroid game; how much better do you think your LLM would do? Maybe this is the benefit of using LLMs for development at an established company. Btw, a lot of those Documents were also created with the assistance of AI by the Product Team, Architects, and Principle/Staff/Leads anyway.
how much twiddling of your existing codebase do you need to do before an AI can understand the business context of it?
With the help of our existing documents and codebase(s) I feel I dont have any issues with the model knowing what we’re doing. I do have to set up my own context for how I want it to be done. To me this is like explaining to a Junior Engineer what I need them to help me with. If you’re familiar with “Know when to Direct, when to Delegate, or when to Develop” I would say it lands in between Direct and Delegate. I have markdown files with my rules and guidelines and provide that as context. I use Augment Code which is pretty good with codebase context.
write an Asteroids clone using JavaScript and HTML 5 Canvas
I would try “Let’s plan out the steps needed to write an Asteroids game using JavaScript and HTML 5. Identify and explain each step of the development plan. The game must build with no errors, be playable, and pass all tests. Do not wrote code at this time until our plan is approved” Then once it comes back with the initial steps, I would guide it further if needed. Finally I would approve the plan and tell it to execute while tracking it’s steps (Augment Code uses a task log).
Are you running your models locally, or do you have some cloud service? If you give your entire codebase to a 3rd party as context, how much of your company’s secret sauce have you disclosed?
We are required to use the frontier models that my employer has contracts with and are forbidden from using local models. In our enterprise contracts we have negotiated for no training on our data. I imagine we pay for that. I’m not involved in that level of interaction on the accounts.
How much pen-testing time are you spending on this code, error handling, edge cases, race conditions, data sanitation? An experienced dev understands these things innately, having fixed these kinds of issues in the past and knows the anti-patterns and how to avoid them. In all seriousness, I think this is going to be the thing that actually kills AI vibe coding, but it won’t be fast enough. There will be tons of new exploits in what used to be solidly safe places.
We have other teams that handle a lot of these tasks. These teams are also using AI tools to get the job done. In addition, we have static testing tools on our repo like CodeRabbit and another one I can’t remember the name of that looks specifically for security concerns. It will comment on the PR directly and our merge would be blocked until handled. Code coverage for testing is at 85% or it blocks the merge and we have a full QA department of Analysts and SDETs to QA. In addition to that we still have human approvals required (2 devs + Sr+). All of these people involved are still using AI tools to help them in each step.
I hope that answers your questions and gives you some insight into how I’ve found success in my experience with it. I will say that on my personal projects I don’t go this far with process and I don’t experience the same AI output that I do at work.
Thanks for your reply, and I can still see how it might work.
I’m curious if you have any resources that do some end-to-end examples. This is where I struggle. If I have an atomic piece of code I need and I can maybe get it started with a LLM and finish it by hand, but anything larger seems to just always fail. So far the best video I found to try a start-to-finish demo was this: https://www.youtube.com/watch?v=8AWEPx5cHWQ
He spends plenty of time describing the tools and how to use them, but when we get to the actual work, we spend 20 minutes telling the LLM that it’s doing stuff wrong. There’s eventually a prototype, but to get there he had to alternate between ‘I still can’t jump’ and ‘here’s the new error.’ He eventually modified code himself, so even getting a ‘mario clone’ running requires an actual developer and the final result was underwhelming at best.
For me, a ‘game’ is this tiny product that could be a viable unit. It doesn’t need to talk to other services, it just needs to react to user input. I want to see a speed-run of someone using LLMs to make a game that is playable. It doesn’t need to be “fun”, but the video above only got to the ‘player can jump and gets game over if hitting enemy’ stage. How much extra effort would it take to make the background not flat blue? Is there a win condition? How to refactor this so that the level is not hard-coded? Multiple enemy types? Shoot a fireball that bounces? Power Ups? And does doing any of those break jump functionality again? How much time do I have to spend telling the LLM that the fireball still goes through the floor and doesn’t kill an enemy when it hits them?
I could imagine that if the LLM was handed a well described design document and technical spec that it could do better, but I have yet to see that demonstrated. Given what it produces for people publishing tutorials online, I would never let it handle anything business critical.
The video is an hour long, and spends about 20 minutes in the middle actually working on the project. I probably couldn’t do better, but I’ve mostly forgotten my javascript and HTML canvas. If kaboom.js was my focus, though, I imagine I could knock out what he did in well under 20 minutes and have a better architected design that handled the above questions.
I’ve, luckily, not yet been mandated that I embed AI into my pseudo-developer role, but they are asking.
-
I miss the days when machine learning was fun. Poking together useless RNN models with a small dataset to make a digital Trump that talked about banging his daughter, end endless nipples flowing into America. Exploring the latent space between concepts.
I always need to laugh when I read “Agentic AI”
I’ve found success using more powerful LLMs to help me create applications using the Rust programming language. If you use a weak LLM and ask it to do something very difficult you’ll get bad results. You still need to have a fundamental understanding of good coding practices. Using an LLM to code doesn’t replace the decision making.
Based on my experience with claude sonnet and gpt4/5… It’s a little useful but generally annoying and fails more often than works.
I do think moderate use still comes out ahead, as it saves a bunch of typing when it does work, but I still get annoyed at the blatantly stupid suggestions I keep having to decline.
I remember GPT 4 being useless and constantly giving wrong information. Now with newer models they’ve become significantly more useful, especially when prompted to be extremely careful and to always double check to ensure the best response.
Even though this shit was apparent from day fucking 1, at least the Tech Billionaires were able to cause mass layoffs, destroy an entire generation of new programmers’ careers, introduce an endless amount of tech debt and security vulnerabilities, all while grifting investors/businesses and making billions off of all of it.
Sad excuses for sacks of shit, all of them.
Look on the bright side, in a couple of years they will come crawling back to us, desperate for new things to be built so their profit machines keep profiting.
Current ML techniques literally cannot replace developers for anything but the most rudimentary of tasks.
I wish we had true apprenticeships out there for development and other tech roles.
LLMs/“Vibe Coding” is probably a little bit more useful than the average intern with some tasks bumping up to an early career hire (what would historically be a Junior Engineer before title inflation/stagnation).
As in: it can generate code that might do what you want. But you need (actual) senior engineers to review the code thoroughly. And… how do people get the experience they need to do that?
Which basically results in turning everyone into a manager. Except your reports aren’t humans and you don’t get more pay. Instead your reports are vscode plugins. Which… sounds like absolute hell but I can get why the (wannabe) management class loves that.
Nowhere close to any junior ime. Grads learn very quickly. Interns only job is to understand. Code academy career switchers understand requirements and will ask questions. Subservient AI does fuck all of any of those things
They are more akin to yet another Rapid Application Development wave imo. Go see how the previous iterations have done. Lots are still with us (rails ftw!). I’ll bet most will outlive LLMs
Even that description is vastly overselling it’s usefulness. Every time someone says it’s like a junior dev I just sigh, because literally the only reason I like junior devs is because they turn into not junior devs. Never once has assigning something to a junior dev made my job easier. The entire goal is to train them to the point they make PRs that I don’t have to walk them through reworking.