100x Developer? The AI-Only Team That Crashed and Burned

Wes Winder fired his engineering team, claiming AI could replace them—only to start hiring again soon after. AI boosts productivity but can’t replace the collaboration and expertise essential to software development. His failure is a cautionary tale against overhyping AI’s capabilities.

Can AI replace an entire software engineering team? In December 2024, Canadian developer Wes Winder thought so. He made headlines by firing his entire engineering staff and replacing them with "o1". With his AI-based one-man team, he claimed to be 100 times more productive. But just a few weeks later, he was back on LinkedIn, searching for new developers.

The tech community reacted with a mix of ridicule and schadenfreude. Perhaps the backlash was so emotional because many fear that AI will replace their jobs. On the other hand, there is a huge amount of hype around ‘AI’ and many engineers have been preaching for years that those are just statistical models. Yes, they are right. Still, AI is very impressive and there is a lot of real value behind this hyped market. But not exactly how Winder thought.

What AI Does Well

AI tools like GitHub Copilot, Cursor, and JetBrains are without question transforming the way developers work. Instead of replacing developers, LLMs empower teams — including the product owners and tech leads — to tackle larger challenges faster.

Creating Context

Often people are put off when they encounter “hallucinations” that are plausible but entirely made up. We have to acknowledge that these aren’t flaws in the AI per se. They are just how they function and usually point to weak grounding, thin context, missing code details, or not accounting for the model’s training cutoff. So in short: Provide solid context. By including all necessary models, types, and documentation, you can ground the AI in your project's reality. Not everything has to be spelled out, and including a judicious amount of code examples suffices most of the time. For example, if your task is to integrate a new UI component, try to supply portions of the domain model, one similar UI component, and related View Models. This usually supplies enough information for the AI to generate a solution that fits your codebase.

Use the Right Model

The next important thing is model selection. Not all models are created equal. General-purpose models like 4o or Claude 3.5 are useful for tactical tasks: they quickly generate even large amounts of code, generate tests, or extract hard-coded strings from your code for i18n. Your prompt would look something like:

/file types/recipe.ts
/file core/services/RecipeService.ts
/file app/dashboard/page.tsx

create a new page that displays the current user's recipes

Here relevant types and services are included, and an example was given that can be used as a blueprint for a new page. The prompt itself does not include any serious prompt engineering; given the context, the model will likely generate good results nonetheless.

This approach works well for quite a while, but eventually, you encounter one particularly nasty bug or need to implement a larger refactoring. When Claude and 4o bite their teeth on a problem because they are lacking what feels like strategic skills, this is when you need a reasoning model like o3. These models have a smaller maximum token output, so they cannot generate whole files, but they are great for reasoning and then succinctly outlining what needs to be done.

When you throw such a prompt against a reasoning model:

/file types/recipe.ts
/file core/services/RecipeService.ts
/file app/recipe/page.tsx
/file hooks/useRecipes.ts

Outline a refactoring to separate concerns between UI, domain, and services

it will discuss your code from an architectural perspective and describe step-by-step how to refactor the code. This plan can then be implemented by a general-purpose model. When you switch the conversation to 4o at this point and just say perform this refactoring, the model will likely generate the refactored classes with a quality that it could not have achieved otherwise.

Another great use case is for debugging or code reviewing. So each time you need to talk about the code with a decent level of reasoning.

Building a Prompt Library

When building context, IDEs can help tremendously. While Cursor, GitHub Copilot, and Tabnine all do a great job at that, I particularly like the approach that Zed takes. They offer slash commands such as:

/file: Include files or entire folders in the prompt
/fetch: Retrieve content like markdown documentation from GitHub via a GET request
/prompt: Insert prompts from your prompt library

Imagine you need to enforce internationalization standards. Instead of explaining to the LLM how that should be done and creating a detailed context every time, a developer can simply type:

/prompt i18n
/file app/shopping-cart/

The team would include all necessary examples, guidelines, and current translation files in the i18n prompt and the instruction to apply those to all following files. The model could extract all non-translated strings from all files in the shopping cart folder and even create the actual translations.

Building such a prompt library streamlines the development process and makes quality standards transparent to every team member. This can even serve as an educational resource for junior developers. With the growing number of requirements on a codebase, this can reduce the mental load of every team member.

Where AI Falls Short

When Winder fired his whole team, he tried to be the hero 100x engineer. There are indeed developers who produce more lines of code than their peers, but the main productivity gains come from leadership skills and finding ways to use technology as a multiplicative factor. Productivity in software engineering is about making better decisions, simplifying complexity, and enabling the entire team to move faster.

AI can act as a lubricant to reduce friction and decrease churn within a team to maybe achieve those “100x” gains. But for any non-trivial project, this cannot be done by a single person. LLMs enhance human capability to produce code, just like a smartphone with internet access does or in days of old: the manual.

Poor Long-Term Planning and Strategy

While reasoning models like o3 can outline a refactoring plan, they still lack the ability to continuously monitor progress. They perform best if the task at hand is not messy and well-defined. The human ‘intuition’ to prioritize tasks or anticipate obstacles based on some gut feeling might give humans an edge over systems for some decades still.

Lack of True Understanding

When you ask ChatGPT about its shortcomings, it will tell you that it lacks true understanding. That due to its nature as a statistical parrot, it can only generate convincing and often functional code, but all with a limited understanding of the patterns and no genuine comprehension. This may be true, yet Ned Batchelder recently made the case that from a functional perspective, it does not matter how a system produces the results. An aircraft does not fly like a bird. Still, it flies by all definitions of the word.

Limited Debugging and Code Review Skills

The lack of understanding might shine through most clearly when it comes to debugging. From my personal experience, even with the correct context, LLMs could rarely find the root cause of a bug and proposed idiotic bug fixes that relied on guesswork. Perhaps the training cutoff is to blame, or a few billion parameters more could help, but in the end, humans are still an order of magnitude stronger in that area.

Of course, GitHub is pushing code review functionality, and it looks promising. They always highlight, though, that it will not replace code reviews. It is a clever linter and can check for obvious oversights.

Context Limitations

LLMs just fail when it comes to proprietary or niche libraries that the LLMs have not been trained on. And since they are stateless, they cannot just learn the documentation unless you are willing to invest in fine-tuning a model. The best advice right now — until the major IDEs improve their semantic search functionalities — is manually writing a well-documented API. This API can then be added to the context.

Lessons for the Industry

Despite the head-spinning developments in AI, the core principles for successful engineering remain unchanged: long-term thinking, human collaboration, and expertise. The winning teams will integrate AI intelligently and practice the usage of AI tools on a daily basis.

Anyone promising 100x productivity gains overnight is selling a fantasy. Success isn’t about writing code faster—it’s about writing the right code, in the right way, for the right reasons. AI can help with that, but in the end, it is just another tool in our toolbox, and we need to become clever in using it.

The question isn’t whether AI will replace developers—it’s which developers will use AI most effectively.