The Augmented Work.
Article № 30 · AI & your career

Anthropic writes over 80% of its code with AI. It just told you which half to learn.

Anthropic now writes most of its own software with AI, and its own report quietly names the part of the job that's still yours. If you're deciding whether it's worth learning to code, or trying to land a first job in software, here's the half to aim at.

Issue June 2026
Read time 7 minutes
Filed under AI & your career · Learning to code · Software jobs
Length 1,850 words
Anthropic Writes Over 80% of Its Code With AI — and Just Told You Which Half of Coding to Learn
In brief

Here’s the short version, before the detail: the most advanced AI lab in the world now writes more than 80% of its own code with AI — and in the same report where it says so, it names the one part of the job AI still can’t do. It’s not the typing. It’s the judgment: deciding what to build, telling whether the result is actually right, and knowing when to stop. If you’re learning to code or trying to break in, aim your years at that half. It’s the half whose price is going up, and — this is the part nobody says clearly — it’s learnable.

This is for you if you’re deciding whether it’s even worth starting, or you’re early in, switching in, or job-hunting in software and tired of headlines that just say “AI writes the code now” and leave you there. It’s not really written for the senior engineer who already has the job locked. The question here is narrower and more useful than “is coding dead”: which part of coding is still worth getting good at — and the people automating their own jobs just answered it, in their own numbers.

What Anthropic actually reported

On June 4, 2026, Anthropic’s institute published a piece called “When AI builds itself”. The headline idea is that AI is starting to do the work of building AI. The receipts are what matter:

You can watch the same curve on a public scoreboard. SWE-bench is a standard test where the AI has to fix real bugs in real open-source projects. When it launched in 2023, the best model solved 1.96% of the problems. By August 2024, GPT-4o was at 33.2% on the cleaned-up version. In 2026, the leading models from several labs sit above 80% — so high that researchers now call the test nearly “too easy.” (Fair caveat: some of those test problems have leaked into training data, so treat 80% as “saturating the benchmark,” not “as good as a human engineer.” Keep that caveat — it’s the whole point below.)

None of this is hype. It’s the clearest signal yet that one specific activity — producing working code on request — is getting cheap fast.

The part almost everyone misreads

“AI writes the code now” sounds like the end of the story. Read the actual report and it’s the opposite. In the same piece, Anthropic is blunt about what its AI still can’t do:

An area of human comparative advantage, for now, is research taste and judgment, including choosing which problems matter, which results to trust, and when an approach is a dead end.

Sit with that. The lab with AI writing 80% of its code says the thing keeping humans in the loop is taste and judgment — picking the problem, trusting (or doubting) the result, calling the dead end. It even spells out the split: the doing — “writing the code, running the experiment, producing the result” — “now costs almost nothing in human time.” So the humans there have shifted to a different question: which of these is even worth doing, and did it actually work?

That’s the tell. “Coding” was never one skill. It’s two:

The half getting cheap The half getting valuable
Typing code on requestDeciding what to build
Translating a clear spec into syntaxWriting the spec in the first place
Producing a plausible solutionTelling whether the solution is correct
Doing the taskKnowing which task is worth doing
Speed of outputJudgment about output

The left column is what’s being automated — at the frontier, on real production code, right now. The right column is what Anthropic just told you it’s still hiring humans for. They’re not opposites of each other; they’re two different jobs that happened to share a title.

And the rest of the field is quietly confirming it. In Stack Overflow’s 2025 developer survey, 84% of developers use or plan to use AI tools — but only 33% trust the accuracy of what it gives back, while 46% actively distrust it, and 66% say their top frustration is “AI solutions that are almost right, but not quite.” Google’s 2025 DORA report found that nearly all developers now use AI, and the time they save on writing code gets spent right back on checking it. The bottleneck moved. It used to be writing the code. Now it’s trusting it.

A two-column diagram titled 'Coding was never one skill — it's two.' The left column, 'The half getting cheap,' lists typing code on request, translating a clear spec into syntax, producing a plausible solution, doing the task, and speed of output — marked as the work AI now automates. The right column, drawn in oxblood as 'The half getting valuable,' lists deciding what to build, writing the spec in the first place, telling whether the solution is correct, knowing which task is worth doing, and judgment about output — the half Anthropic says it still hires humans for.
Figure 01Two jobs that happened to share a title. The left column is being automated on real production code; the right column — judgment — is the half rising in value.

What “judgment” actually means — concretely

“Judgment” sounds like the kind of word that’s true and useless. So here’s what it actually is, as things you can practice. When the typing is free, value lives in five moves:

  1. Choosing the problem. Out of everything you could build, knowing which one is worth building. This is product sense, user sense, business sense — the stuff that decides whether the code should exist at all.
  2. Specifying it. Turning a vague want (“make onboarding less annoying”) into something precise enough that a machine — or a junior — can execute it without guessing wrong. A clear spec is now a deliverable, not a formality.
  3. Verifying the result. Reading code you didn’t write and telling whether it’s correct, not just whether it runs. The model produces something that looks right 66% of the time and is subtly wrong the rest; catching that gap is the skill.
  4. Knowing when to stop. Recognizing a dead end three steps in instead of thirty. Anthropic listed this one by name — “when an approach is a dead end” — because it’s expensive and AI is still bad at it.
  5. Steering. Breaking a big goal into pieces, handing them off, and directing the thing doing the typing. Increasingly the job is less “write the function” and more “decide what functions there should be, then check them.”

Notice none of these require you to type faster than a model. They require you to think more clearly than the prompt. That’s a different muscle, and you build it by working through real problems end to end — not by memorizing syntax.

So what do you actually do — if you’re learning or job-hunting

Be honest about the floor first, because pretending it’s fine helps no one. The entry rung really has thinned. A Stanford study using payroll data on millions of workers (November 2025) found that workers aged 22–25 in the most AI-exposed jobs saw employment fall about 13% since late 2022, with software developers among the hardest hit — while older workers in the same jobs held steady or grew. Software-developer job postings sat roughly 30% below their early-2020 level by mid-2025.

Now the honest asterisk: not all of that is AI. Indeed’s own analysts note that much of the drop started before ChatGPT existed — it’s also higher interest rates and the post-2021 hiring hangover. But the part that is AI lands hardest on exactly the work that’s mostly typing-on-request — the junior task of turning a clear ticket into code. That’s the half getting cheap, and it was the traditional on-ramp. Which means the on-ramp changed; it didn’t close.

So aim differently:

The junior who can steer the machine and catch its mistakes isn’t worth less in this market. They’re worth more than the one who can only out-type it — because out-typing it is the thing that’s now free.

The part that’s still science fiction — for now

One honest boundary, so you can ignore the scariest headlines. Anthropic’s actual warning is about recursive self-improvement — AI getting good enough to design and build its own successor with little human help. That’s the far scenario, and the company is upfront that it isn’t here: the thing standing in the way is precisely the taste and judgment AI hasn’t matched. The report lays out a nearer future too — AI does most of the doing while humans keep setting the direction and checking the work. That nearer one isn’t a forecast. It’s a description of how Anthropic already works today.

Don’t plan your career around the science-fiction scenario; you can’t, and it may not arrive. Plan it around the one that’s already true: one person with good judgment now directs the output of what used to take a team. That person is more valuable every month, not less. The job is to become that person — not to win a typing race against a machine that already typed 80% of a frontier lab’s code this year.

The skill that’s leaving is doing the task. The skill that’s staying is knowing which task is worth doing, and whether the machine actually did it.

If you take one action this week, take this one: pick a small thing you wish existed, have an AI help you build it, and then spend most of your time figuring out where it’s wrong. That second part — the doubting, the checking, the deciding — is the job that’s still hiring. Get good at the half the lab kept for itself.

Sources

  1. 1
    Marina Favaro & Jack Clark, “When AI builds itself,” The Anthropic Institute, June 4 2026 — source for the 80%+ merged-code figure (from “low single digits” before Claude Code’s Feb 2025 launch), the 8×/day output figure, the 4-minute → 12-hour task progression, the “research taste and judgment” quote, and the recursive-self-improvement framing. (All internal, self-reported metrics.)
  2. 2
    METR, “Measuring AI Ability to Complete Long Tasks” — Time Horizon 1.1, Jan 29 2026; original paper arXiv:2503.14499, Mar 2025 — independent source for the task-length doubling trend.
  3. 3
  4. 4
    Brynjolfsson, Chandar & Chen, “Canaries in the Coal Mine,” Stanford Digital Economy Lab, Nov 2025 — early-career employment decline in AI-exposed work.
  5. 5
    Indeed Hiring Lab, “Software Development Postings Remain in the Doldrums,” Feb & Jul 2025 — software job-postings data and the pre-ChatGPT caveat.
  6. 6
    Stack Overflow 2025 Developer Survey — 84% AI use, 46% distrust accuracy, 66% “almost right, but not quite.”
  7. 7
    Google Cloud / DORA, “2025 State of AI-Assisted Software Development,” Sept 2025 — verification as the new bottleneck.