Vibe Coding Experiment Failures
Posted by Al Sweigart in misc
Over the past week I've been experimenting with vibe coding: asking LLMs such as ChatGPT, Claude, and Gemini write entire apps as if I had absolutely no programming ability at all. LLMs can easily solve programming challenges or interview questions. But I wanted to see how far the current LLMs can go when asked to make complete apps, and what kinds of failure patterns emerge. From the role of a non-programmer, I would only be able to fix bugs by describing them to the LLM. For simplicity, I choose small apps written in Python that use only the standard library and the tkinter package for the GUI. This blog post details the failures: the kinds of apps that AI just isn't capable of making.
I'll update this blog post as I find new examples.
I don't care about polished or beautiful user interfaces (they are restricted to using tkinter after all.) I want to know if the generated app actually works without significant errors. For these experiments I'm using ChatGPT 5, Gemini 2.5 Pro, and Claude Sonnet 4.
I've included the source for some of the programs that the LLMs produced. If you do manage to get a working version of any of these ideas, I'd love to hear about the results: [email protected].
Failure Patterns in LLM-Generated Apps
LLMs tended to fail to create software with these qualities:
- Slightly unusual. Any app that hasn't been implement hundreds of times before (Tetris, stopwatch, to-do list, etc.)
- Require spacial or visual qualities. LLMs generate text, but dealing with coordinates or drawing tended to fall apart.
- Similar but not identical to common apps. When asked to create pinball, it would create pong. When asked to create the amorphous blobs of a lava lamp, it draws perfect circles. LLMs would regress to common but inaccurate examples, sometimes even in spite of specifric instructions not to.
List of Failed Vibe Coding Experiments
- African Countries Geography Quiz
- Pinball Game
- Circular Maze Generator
- Interactive Chinese Abacus
- Combination Lock Simulator
- Family Tree Diagram Editor
- Lava Lamp Simulator
- Snow Globe Simulator
African Countries Geography Quiz
Prompt: "Create a Python program using tkinter for the GUI and only uses the packages in the Python standard library. The app displays a simple map of Africa that doesn’t have country borders. It’s just the outline of Africa. It prompts the user with the name of a random African country, then user must click on the map. The app shows the outline of the country on the map for a few seconds, and then prompts the user with another random country name. To keep it simple, there is no score or points. The app doesn’t acknowledge if the user clicked within the borders of the country or not. (They can see this for themselves when the country outline appears.) To keep it simple, there is no stopping point. The geography quiz continues until the user closes the app’s window."
Result: LLMs keep forgetting that Madagascar is part of Africa unless you remind them. Also, LLMs seem to think that Africa is shaped like a potato.
I was hoping for something similar to this geography quiz that shows Africa without country borders, then asks the user to click on a given country.
One of the LLMs got something approximate to Africa, and it also drew Madagascar (after I reminded it that Madagascar is part of Africa.) The shapes of countries are not... accurate. I instructed them to do web searches for SVG files of maps of Africa. They said they did, and then would draw another potato.
Pinball Game
Prompt: "Create a simple pinball game in Python using tkinter or turtle and only the standard library."
Result: Invariably the LLMs wanted to give me a pong game with a ball that moved at a constant speed. I had to add additional details telling it to not make pong, to have two flippers, to have realistic gravity, and the sides should funnel the ball towards the flippers. The LLMs didn't implement collision detection and left the pinball field blank. I wasn't expecting Space Cadet Pinball, but no amount of re-prompting created anything remotely acceptable. I also tried creating examples that use Pygame, just in case tkinter was too limiting of a factor.
In one case, the left flipper was incorrectly positioned but flipped the correct way, while the right flipper was correctly positioned but flipped the wrong way.
fail_pinball_1.py (uses Pygame)
Circular Maze Generator
Prompt: "Write a Python program that only uses tkinter and the standard library to generate a picture of a circle-shaped maze. The walls and boundaries of the maze should not be straight lines or rectangles. The walls and boundaries of the maze should not be straight lines or rectangles. The player starts in the center and must reach the exit at the top. The player uses the keyboard arrow keys to move, and must not be able to walk through walls."
Result: LLMs can easily generate rectangular mazes programs. However, the circular mazes they made were crude fascimiles: the walls were sprinkled randomly with unreachable areas and obvious, almost straight line paths. Sometimes the LLM would write a program that for all its code simply displayed a blank window. And the keyboard controls were completely broken and unfixable.
Interactive Chinese Abacus
Prompt: "Create a Python program that uses only tkinter and the standard library to create an interactive abacus. The user can click on beads to slide them around. There should be both "heaven" and "earth" beads. The number represented by the abacus's configuration should be displayed at the bottom of the window."
Result: The LLMs made programs that displayed an abacus, but the sliding behavior of the beads would invariably be broken. The wrong beads would slide when clicked and couldn't slide back to their original position. The display number would be completely off and sometimes negative.
Combination Lock Simulator
Prompt: "Create a Python program that uses only tkinter and the standard library. The program is a combination lock simulator. The program displays a combination lock and and lets the user drag the mouse to spin the dial. The combination is displayed at the top. If the user has correctly entered the combination, when they click the latch the latch will pop open. The window should display instructions to the user on how to use the app to open the lock."
Result: While the programs would display a rudimentary combination lock and dial with numbers, turning the dial was haphazard. The dial numbers would always be upright as the dial turned, but I didn't require that they be rotated correctly. Entering the combination was impossible, and even when I entered the correct numbers, clicking the latch button had no effect.
Family Tree Diagram Editor
Prompt: "Create a Python program that uses only tkinter and the standard library. Create a family tree diagram app. The program uses squares with text names to represent people. The program begins with a single square. You can click on squares to edit the name, add a spouse, add a child, add a parent, or delete the person. The diagram automatically redraws itself as people are added or deleted. Use the standard lines of a family tree diagram."
Result: Completely broken. The window would display the starting person square and some of the generated apps let me edit the name. But adding any relations would either do nothing or completely break/make invisible the existing diagram. It was never able to draw a second square.
Lava Lamp Simulator
Prompt: "Create a Python program that only uses tkinter and the standard library. The program should be a lava lamp simulator that shows blobs gently floating around the window. Do not use circles, ellipses, or polygons for the blobs. Use bezier curves for the blob shapes. The blobs should slowly combine and separate like they do in a lava lamp."
Result: These looked nothing like lava lamps. The programs would display shapes that move, but that's all. They would bluntly combine together as they came close, with one blob disappearing and the other immediately increasing in size. The blobs would never separate; some apps had small blobs spontaneously generate out of thin air. The blobs tended to vibrate like nervous Chihuahuas.
One of the LLMs just drew blob outlines.
Snow Globe Simulator
Prompt: "Create a Python program that uses only tkinter and the standard library. The program is a "snow globe" with a blue background and white "snow" in it. When the window is moved around and "shook", the snow should also move around in it the way it would in a snow globe. Shaking the window faster causes the snow to move faster. The snow flakes should swirl inside the window rather than move like bouncing ping pong balls."
Result: Claude had some limited success; moving the window would "shake" the snow flakes but their behavior was more like ping pong balls in a cardboard box than snow flakes in a snow globe. The speed and vigor I shook the window seemed to have no difference; slow and fast shaking resulted in the same effect.