Did AI make developers faster or slower?

A 2025 METR randomized controlled trial — 16 experienced open-source developers, 246 real tasks from repositories they knew well — found that when AI was allowed, the developers took 19% longer, not faster. The viral “19% faster” claim is the exact reverse of the finding. METR scoped the result narrowly to expert developers on mature codebases using early-2025 tools, and explicitly did not claim AI slows everyone down.

Does AI really write 90% of code?

No. The “90%” traces to a March 2025 prediction by Anthropic CEO Dario Amodei that AI would write 90% of code within three to six months — a forecast, not a measurement, and the window lapsed around September 2025 with no industry-wide 90% in sight. The measured figures are smaller and hedged: Google reported over a quarter of new code is AI-generated but reviewed and accepted by engineers, and Microsoft put it at maybe 20–30% in some projects.

Did entry-level tech hiring really collapse 73%?

The “73.4% collapse” has no traceable primary source — it’s attributed to unnamed reports with no dataset or working link, so it grades an automatic F. The sourced decline is smaller: Stanford’s ADP-payroll study found early-career workers in the most AI-exposed jobs down roughly 13–16% through 2025, and the New York Fed pegs recent CS-grad unemployment at 6.1% — high for the field, but far from 73%.

The AI-and-Careers Stats Everyone Repeats, Graded by Evidence

The scariest AI-and-careers stat in your feed is almost always the worst-sourced one — and the most-repeated one is usually a real study that got flipped backwards on the way to you.

That’s the finding, up front. When you trace the canonical viral numbers back to where they actually came from — the earnings call, the controlled trial, the working paper — a pattern falls out that’s more useful than any single debunk. A round, alarming number (“AI writes 90% of code,” “entry-level hiring collapsed 73%”) almost never has a primary source behind it. Meanwhile the real, well-sourced findings keep reaching you with their direction reversed, their denominator dropped, or their date quietly expired. So below is each big stat traced to its primary source and graded A through F — and, more usefully, the four questions that produced those grades, so you can run them on the next stat before you repost it.

This is for you if you’ve ever paused over a doom-thread number with your thumb over the share button — half-believing it, half-suspecting it’s junk, unsure how to tell in the ten seconds you’re willing to spend. If you just want the figure that confirms what you already feel, this will annoy you: the grading goes both ways. Some scary stats are real. Some comforting ones are folklore too.

Grade the source before you grade the claim

Every grade below comes from the same four questions, asked in order. Publishing them first is the point — a rubric you invent after seeing the answer is just cherry-picking with extra steps.

Is there a primary source? Can you click through to the actual paper, transcript, dataset, or earnings call — not a news story citing a blog citing a screenshot? No traceable primary source is an automatic F, however confident the number sounds.
What’s the denominator? Half of what? New code or all code? One company or the industry? Developers, or the general population? A number with no denominator isn’t a measurement — it’s a vibe wearing a percent sign.
Is it a measurement or a prediction — and does it measure what the repeater claims? “AI will write 90% of code” is a forecast, not a fact. “A quarter of code is AI-generated” can mean a human accepted an autocomplete suggestion, not that the machine wrote it alone. A self-reported, unaudited count weighs less than a controlled trial.
Is the direction preserved? This is the one nobody checks and the one that fails most often. Did “AI slowed developers down” reach you as “AI sped developers up”? A finding can pass every other test and still arrive meaning the opposite of what it found.

Clear all four and the claim earns an A. Miss the denominator or the direction and you’re in C–D territory. No primary source at all is an F — you’re not sharing a finding, you’re forwarding a rumor with a decimal point on it. Call the whole rubric the evidence grade, and keep it loaded.

A vertical rubric titled 'The evidence grade.' Four questions are stacked in order — 1. Is there a primary source? 2. What's the denominator? 3. Is it a measurement or a prediction? 4. Is the direction preserved? — each with a short test. Below, a report card grades the canonical viral stats: 'AI writes 90% of code' D, 'AI made developers 19% faster' F, 'Entry-level hiring collapsed 73%' F, 'AI scores only 23% on hard coding tasks' A with an expiry mark, '70% of developers have imposter syndrome' D. Footer: clear all four for an A; no primary source is an automatic F. — Figure 01The evidence grade: four questions, asked in order. Clear all four and the claim earns an A; no traceable primary source is an automatic F.

The report card

Here’s the canonical batch, graded. The autopsies follow, in descending order of how often they’re reposted.

The stat, as you’ve seen it	Grade	What actually happened
“AI writes 90% of code now”	D	A vendor’s 3–6-month prediction, recirculated as present-tense fact; the window lapsed
“AI made developers 19% faster”	F	The study found the opposite — 19% slower; the direction got flipped
“Entry-level tech hiring collapsed 73%”	F	No traceable primary source; the real, sourced decline is far smaller
“AI scores only 23% on hard coding tasks”	A*	True and well-sourced — in September 2025; stale by mid-2026
“70% of developers have imposter syndrome”	D	A general-population, lifetime estimate relabeled as a developer rate

* An A at publication, expired since.

“AI writes 90% of the code now” — grade: D

Three executives, three different numbers, one shared problem. Sundar Pichai told Alphabet’s Q3 2024 earnings call that “more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers” (Alphabet, October 2024). Satya Nadella said “maybe 20 to 30 percent of the code… in some of our projects” is AI-written (LlamaCon, via The Register, April 2025). Dario Amodei predicted AI would be “writing 90 percent of the code” within three to six months (Council on Foreign Relations, March 2025; reality-checked by Redwood Research).

Run the four questions. Primary source? Yes for all three — these are real, on-record quotes, and that part grades an A. The denominator is where it breaks. Pichai said new code, not the codebase, with every line reviewed and accepted by a human — that’s assistance, not autonomy. Nadella said some projects, gave a range, and hedged it with “maybe.” And Amodei’s “90%” was never a measurement at all: it was a forecast, from the CEO of the company that sells the coding model, and the three-to-six-month window came and went around September 2025 with no industry-wide 90% in sight. By the time this reaches you as “AI writes most of the code now,” a reviewed autocomplete in some Microsoft projects and a vendor’s lapsed prediction have fused into one autonomous-robot-programmer fact. Authentic quotes, D-grade claim. The gap between “AI suggested it” and “AI wrote it” is the whole subject of why English is not the new programming language.

“AI made developers 19% faster” — grade: F

This is the one almost everyone gets backwards, and it’s worth slowing down on, because the study underneath is excellent. METR ran a randomized controlled trial in early 2025: 16 experienced open-source developers, 246 real tasks from large repositories they’d worked in for years, each task randomly assigned to allow or forbid AI. The result — when AI was allowed, the developers took 19% longer. AI slowed them down (METR, July 2025).

The detail that makes it stick: before the trial, those same developers predicted AI would make them 24% faster; afterward — having actually been slowed — they still believed it had sped them up about 20%. Perception was inverted from reality by nearly 40 points.

So when “19%” reaches you as “AI makes developers 19% faster,” that’s not a rounding error — it’s the exact reverse of the finding, which is why the claim grades an F while the study grades an A. Two distortions ride along. The direction flip is one. The other is an over-generalization METR explicitly disowns: it scoped the result to expert developers on mature codebases they know deeply, using early-2025 tools, and stated plainly that it is not claiming AI slows down most developers everywhere. The honest version is narrow, strange, and true. The viral version is wide, comfortable, and backwards. Same tool, opposite outcomes depending on who’s holding it and where — which is exactly why directing the tool well is becoming the job.

“Entry-level tech hiring collapsed 73%” — grade: F

Here the folklore and the finding point the same way — the entry door really is narrower — but the viral figure is invented and the real one is smaller and more honest.

The “73.4% year-over-year collapse” that gets screenshotted into every junior-dev doom thread traces to an aggregator citing unnamed “Ravio” reports with no report ID, no dataset, and no working link. It fails question one at the door: no primary source, automatic F. Don’t repost it.

The sourced version is alarming enough without the inflation. Stanford’s Canaries in the Coal Mine? working paper — built on ADP payroll records, the strongest dataset in this whole set — found early-career workers aged 22–25 in the most AI-exposed jobs saw a ~13% relative employment decline through July 2025, revised up to ~16% by October; software developers aged 22–25 specifically were down about 20% from their late-2022 peak (Stanford Digital Economy Lab, 2025). SignalFire, a VC firm, separately reports new grads fell to 7% of Big-Tech hires, down 25% versus 2023 and over 50% versus 2019 (SignalFire, May 2025). For scale, the New York Fed pegs recent computer-science-grad unemployment at 6.1% — high for the field, but a world away from 73% (NY Fed, 2025).

Two caveats the viral version drops. The headline number moved — 13% became 16% across drafts, so always cite which. And the authors put a question mark in their own title on purpose: 2022–24’s rate-driven tech layoffs are a real confounder, and AI is the leading suspect here, not a convicted one. Grade: B for the careful version, F for the “73%.” What a narrowing entry door means for how you actually build a career is the subject of why you don’t have imposter syndrome and whether to specialize early.

“AI scores only 23% on hard coding tasks” — grade: A, with an expiry date

Sometimes a number is real, well-sourced, and still curdles into folklore — because it went stale. On SWE-bench Pro, a contamination-resistant benchmark of long, multi-file engineering tasks, the best frontier model (GPT-5) resolved 23.3% of the public set when Scale AI published the paper in September 2025 — against 70%+ on the easier SWE-bench Verified (Scale AI, September 2025). That gap was real and the source is solid: it’s clean evidence that models which look brilliant on tidy benchmarks fall apart on the messy, ambiguous work that fills an actual sprint.

But it was a snapshot. Live SWE-bench Pro leaderboards in mid-2026 reportedly put top public-set scores back up in the ~70% range (Scale, live leaderboard) — a fast-moving figure worth treating as context, not a fixed fact. So a 2026 post citing “AI only scores 23%” passes every question except the one about dates: it’s true the way last winter’s forecast is true. Grade it A for September 2025, and check the calendar before you repost it. The durable lesson isn’t the number — it’s that even an A-grade stat needs a date stamped on it.

“70% of developers have imposter syndrome” — grade: D

You’ve seen it in every onboarding deck and every “you belong here” talk. Traced back, the 70% isn’t about developers and isn’t about a syndrome. It comes from a 2011 paper (Sakulku & Alexander) estimating that nearly 70% of people will have at least one impostor episode at some point in their lives — general population, lifetime, a single passing moment of self-doubt (Sakulku & Alexander, 2011).

Two substitutions turn that into the version you’ve seen: “people” quietly becomes “developers,” and “one episode in a lifetime” becomes “have imposter syndrome,” implying a chronic, current condition. Developer-specific surveys land lower. It fails question two (wrong denominator) and question three (a lifetime episode is not a standing diagnosis). Grade: D. The fuller autopsy — and why mislabeling a real, fixable skill gap as a “syndrome” actively works against you — is in the full piece on imposter syndrome.

The pattern: the error always runs one direction

Step back and the grades sort themselves. The findings that survive — METR’s slowdown, Stanford’s payroll decline, the benchmark gap — all have a primary source and a denominator. The claims that fail — “90%,” “73%,” “70%” — are either predictions wearing a fact’s clothes or numbers with no source at all.

But the sharper pattern is in how the real ones decay: never randomly. The direction flips toward the more shareable reading (slower becomes faster, because faster is the sellable story). The denominator drops in the scarier direction (new code becomes all code; the general population becomes developers). The date falls off the moment the number stops being current. Folklore is just the most-shareable mutation of a real finding outliving the boring, accurate version of it.

The scariest number in your feed is the one most worth distrusting. It didn’t get loud by being well-sourced — it got loud by being frightening, and frightening travels faster than true.

Grade it before you share it

Next time a number stops your scroll — especially one that confirms the thing you’re already afraid of — give it the same ten seconds you were about to spend amplifying it. Can you reach a primary source? Half of what? Measurement or prediction? Same direction it started in? Four questions. If it can’t answer them, you’re not informing your network — you’re laundering a rumor into it, where someone sharper will eventually trace it and trust you a little less.

Keep the evidence grade loaded for the next doom thread. The calm, sourced reply travels further than the dunk — and it’s the same instinct that makes you check an agent’s confident “done” before you ship it.

Distrust the number that wants to scare you — especially when it’s right about the direction and lying about the size.

Sources

1
“A quarter of new code at Google is AI-generated, reviewed and accepted by engineers” — Sundar Pichai, Alphabet Q3 2024 earnings remarks (official Google blog), October 29 2024.
2
“Maybe 20–30% of code in some projects” (Nadella) and “maybe half the development in a year” (Zuckerberg) — LlamaCon fireside, reported verbatim by The Register, ~April 30 2025.
3
“AI writing 90% of code in 3–6 months” — Dario Amodei, Council on Foreign Relations, March 10 2025; a prediction (the window lapsed), reality-checked by Redwood Research / LessWrong.
4
METR RCT: AI increased completion time by 19% for experienced devs; they wrongly believed they were ~20% faster — METR, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity,” July 10 2025 (also arXiv:2507.09089).
5
Early-career employment decline (~13% through July 2025, revised ~16% by October; software devs 22–25 down ~20% from late-2022 peak) — Brynjolfsson, Chandar & Chen, “Canaries in the Coal Mine?”, Stanford Digital Economy Lab working paper, Aug 2025 draft / Nov 13 2025 revision.
6
New grads = 7% of Big-Tech hires, down 25% vs 2023 and >50% vs 2019 — SignalFire State of Tech Talent Report 2025 (a VC firm; AI cited as one of several drivers), May 20 2025.
7
Recent CS-grad unemployment 6.1% (computer engineering 7.5%) — Federal Reserve Bank of New York, Labor Market for Recent College Graduates (ACS 2023 vintage).
8
“73.4% entry-level collapse” — UNVERIFIABLE, do not use. Attributed by an aggregator to “Ravio” reports with no traceable primary source.
9
SWE-bench Pro: best model (GPT-5) 23.3% on the public set, with Claude Opus 4.1 next at 23.1%; commercial subset ≤17.8%; vs 70%+ on SWE-bench Verified — Scale AI, “SWE-Bench Pro,” arXiv:2509.16941, submitted September 21 2025.
10
SWE-bench Pro live leaderboard (mid-2026 top public-set scores ~70%+) — Scale public leaderboard; time-sensitive, moves constantly — context, not a fixed fact.
11
“70%” originates as a general-population, lifetime, at-least-one-episode estimate (relabeled as a developer rate) — Sakulku & Alexander, The Impostor Phenomenon, International Journal of Behavioral Science, 2011 (construct coined by Clance & Imes, 1978).

A sourcing note: every grade above rests on a primary source dated in the citation. Two figures are deliberately hedged, not asserted — the “73.4% collapse” is flagged as unverifiable and should not be repeated, and the mid-2026 SWE-bench Pro leaderboard range is live, fast-moving context used only to show that the September 2025 “~23%” snapshot has expired. The headline employment number (13% → 16%) moved across paper versions; cite the version and date, and treat AI as the leading hypothesis for the decline, not proven causation.