Here’s the short version, before the detail: the most advanced AI lab in the world now writes more than 80% of its own code with AI — and in the same report where it says so, it names the one part of the job AI still can’t do. It’s not the typing. It’s the judgment: deciding what to build, telling whether the result is actually right, and knowing when to stop. If you’re learning to code or trying to break in, aim your years at that half. It’s the half whose price is going up, and — this is the part nobody says clearly — it’s learnable.
This is for you if you’re deciding whether it’s even worth starting, or you’re early in, switching in, or job-hunting in software and tired of headlines that just say “AI writes the code now” and leave you there. It’s not really written for the senior engineer who already has the job locked. The question here is narrower and more useful than “is coding dead”: which part of coding is still worth getting good at — and the people automating their own jobs just answered it, in their own numbers.
What Anthropic actually reported
On June 4, 2026, Anthropic’s institute published a piece called “When AI builds itself”. The headline idea is that AI is starting to do the work of building AI. The receipts are what matter:
- More than 80% of the code Anthropic merges into its own product is now written by Claude — up from “low single digits” before its Claude Code tool launched in February 2025. (“Merging” code means adding finished, working code into the real product, not a scratch experiment.) Leadership’s informal estimate, in a footnote, is 90%-plus once you count scripts and throwaway code (Anthropic, June 4 2026).
- A typical Anthropic engineer merged 8× more code per day in early 2026 than in 2024. Same people, eight times the shipped output.
- The length of task an AI can finish on its own keeps doubling — Anthropic puts the recent pace at roughly every four months, down from about seven. The trend line comes from METR, an independent research nonprofit, not from Anthropic’s marketing — and METR’s January 2026 numbers back it up (a ~6.5-month doubling over the long run, accelerating toward ~3 months lately). In plain terms: Claude Opus 3 in March 2024 could handle coding tasks that take a human about four minutes; two years later, a newer model was handling tasks that take about twelve hours.
You can watch the same curve on a public scoreboard. SWE-bench is a standard test where the AI has to fix real bugs in real open-source projects. When it launched in 2023, the best model solved 1.96% of the problems. By August 2024, GPT-4o was at 33.2% on the cleaned-up version. In 2026, the leading models from several labs sit above 80% — so high that researchers now call the test nearly “too easy.” (Fair caveat: some of those test problems have leaked into training data, so treat 80% as “saturating the benchmark,” not “as good as a human engineer.” Keep that caveat — it’s the whole point below.)
None of this is hype. It’s the clearest signal yet that one specific activity — producing working code on request — is getting cheap fast.
The part almost everyone misreads
“AI writes the code now” sounds like the end of the story. Read the actual report and it’s the opposite. In the same piece, Anthropic is blunt about what its AI still can’t do:
An area of human comparative advantage, for now, is research taste and judgment, including choosing which problems matter, which results to trust, and when an approach is a dead end.
Sit with that. The lab with AI writing 80% of its code says the thing keeping humans in the loop is taste and judgment — picking the problem, trusting (or doubting) the result, calling the dead end. It even spells out the split: the doing — “writing the code, running the experiment, producing the result” — “now costs almost nothing in human time.” So the humans there have shifted to a different question: which of these is even worth doing, and did it actually work?
That’s the tell. “Coding” was never one skill. It’s two:
| The half getting cheap | The half getting valuable |
|---|---|
| Typing code on request | Deciding what to build |
| Translating a clear spec into syntax | Writing the spec in the first place |
| Producing a plausible solution | Telling whether the solution is correct |
| Doing the task | Knowing which task is worth doing |
| Speed of output | Judgment about output |
The left column is what’s being automated — at the frontier, on real production code, right now. The right column is what Anthropic just told you it’s still hiring humans for. They’re not opposites of each other; they’re two different jobs that happened to share a title.
And the rest of the field is quietly confirming it. In Stack Overflow’s 2025 developer survey, 84% of developers use or plan to use AI tools — but only 33% trust the accuracy of what it gives back, while 46% actively distrust it, and 66% say their top frustration is “AI solutions that are almost right, but not quite.” Google’s 2025 DORA report found that nearly all developers now use AI, and the time they save on writing code gets spent right back on checking it. The bottleneck moved. It used to be writing the code. Now it’s trusting it.
What “judgment” actually means — concretely
“Judgment” sounds like the kind of word that’s true and useless. So here’s what it actually is, as things you can practice. When the typing is free, value lives in five moves:
- Choosing the problem. Out of everything you could build, knowing which one is worth building. This is product sense, user sense, business sense — the stuff that decides whether the code should exist at all.
- Specifying it. Turning a vague want (“make onboarding less annoying”) into something precise enough that a machine — or a junior — can execute it without guessing wrong. A clear spec is now a deliverable, not a formality.
- Verifying the result. Reading code you didn’t write and telling whether it’s correct, not just whether it runs. The model produces something that looks right 66% of the time and is subtly wrong the rest; catching that gap is the skill.
- Knowing when to stop. Recognizing a dead end three steps in instead of thirty. Anthropic listed this one by name — “when an approach is a dead end” — because it’s expensive and AI is still bad at it.
- Steering. Breaking a big goal into pieces, handing them off, and directing the thing doing the typing. Increasingly the job is less “write the function” and more “decide what functions there should be, then check them.”
Notice none of these require you to type faster than a model. They require you to think more clearly than the prompt. That’s a different muscle, and you build it by working through real problems end to end — not by memorizing syntax.
So what do you actually do — if you’re learning or job-hunting
Be honest about the floor first, because pretending it’s fine helps no one. The entry rung really has thinned. A Stanford study using payroll data on millions of workers (November 2025) found that workers aged 22–25 in the most AI-exposed jobs saw employment fall about 13% since late 2022, with software developers among the hardest hit — while older workers in the same jobs held steady or grew. Software-developer job postings sat roughly 30% below their early-2020 level by mid-2025.
Now the honest asterisk: not all of that is AI. Indeed’s own analysts note that much of the drop started before ChatGPT existed — it’s also higher interest rates and the post-2021 hiring hangover. But the part that is AI lands hardest on exactly the work that’s mostly typing-on-request — the junior task of turning a clear ticket into code. That’s the half getting cheap, and it was the traditional on-ramp. Which means the on-ramp changed; it didn’t close.
So aim differently:
- Build things end to end, not exercises. Three finished projects you can explain — why you built it, what you’d change — beat thirty tutorials. The value isn’t the code; it’s that you made the decisions.
- Practice reviewing, not just writing. Take AI-generated code and find what’s wrong with it. The ability to catch the “almost right but not quite” is now a hireable skill in its own right.
- Get good at the spec. Practice turning a fuzzy idea into a precise one. If you can write a brief so clear that the machine builds the right thing, you’re doing the expensive half.
- Use the AI loudly, then audit it. Don’t hide that you used it — show you directed and checked it. “I had it draft three approaches, here’s why I killed two” is the answer that gets you hired.
The junior who can steer the machine and catch its mistakes isn’t worth less in this market. They’re worth more than the one who can only out-type it — because out-typing it is the thing that’s now free.
The part that’s still science fiction — for now
One honest boundary, so you can ignore the scariest headlines. Anthropic’s actual warning is about recursive self-improvement — AI getting good enough to design and build its own successor with little human help. That’s the far scenario, and the company is upfront that it isn’t here: the thing standing in the way is precisely the taste and judgment AI hasn’t matched. The report lays out a nearer future too — AI does most of the doing while humans keep setting the direction and checking the work. That nearer one isn’t a forecast. It’s a description of how Anthropic already works today.
Don’t plan your career around the science-fiction scenario; you can’t, and it may not arrive. Plan it around the one that’s already true: one person with good judgment now directs the output of what used to take a team. That person is more valuable every month, not less. The job is to become that person — not to win a typing race against a machine that already typed 80% of a frontier lab’s code this year.
The skill that’s leaving is doing the task. The skill that’s staying is knowing which task is worth doing, and whether the machine actually did it.
If you take one action this week, take this one: pick a small thing you wish existed, have an AI help you build it, and then spend most of your time figuring out where it’s wrong. That second part — the doubting, the checking, the deciding — is the job that’s still hiring. Get good at the half the lab kept for itself.
Sources
- 1Marina Favaro & Jack Clark, “When AI builds itself,” The Anthropic Institute, June 4 2026 — source for the 80%+ merged-code figure (from “low single digits” before Claude Code’s Feb 2025 launch), the 8×/day output figure, the 4-minute → 12-hour task progression, the “research taste and judgment” quote, and the recursive-self-improvement framing. (All internal, self-reported metrics.)
- 2METR, “Measuring AI Ability to Complete Long Tasks” — Time Horizon 1.1, Jan 29 2026; original paper arXiv:2503.14499, Mar 2025 — independent source for the task-length doubling trend.
- 3SWE-bench: original benchmark paper (1.96%), arXiv:2310.06770, 2023; OpenAI “SWE-bench Verified” (GPT-4o, 33.2%), Aug 13 2024; official leaderboard (~80%+), 2026.
- 4Brynjolfsson, Chandar & Chen, “Canaries in the Coal Mine,” Stanford Digital Economy Lab, Nov 2025 — early-career employment decline in AI-exposed work.
- 5Indeed Hiring Lab, “Software Development Postings Remain in the Doldrums,” Feb & Jul 2025 — software job-postings data and the pre-ChatGPT caveat.
- 6Stack Overflow 2025 Developer Survey — 84% AI use, 46% distrust accuracy, 66% “almost right, but not quite.”
- 7Google Cloud / DORA, “2025 State of AI-Assisted Software Development,” Sept 2025 — verification as the new bottleneck.