The Augmented Work.
Article № 35 · AI & Work

You’re not paid to read every line anymore. Pretending you still are is the risk.

The craft of engineering is quietly moving from writing code to vouching for code you didn't write. The skill that replaces “I read every line” isn't reading harder — it's building the evaluators, harnesses, and proofs that let you stand behind output you can't fully inspect.

Issue June 2026
Read time 14 minutes
Filed under AI & Work · Engineering · Careers
Length 3,700 words
You're Not Paid to Read Every Line Anymore — and Pretending You Still Are Is the Risk
In brief

Two senior engineers approve a pull request on the same Tuesday afternoon.

The first one does it the way she was taught. She opens the diff, reads it top to bottom, follows the logic into two files she didn’t touch, runs it in her head, and clicks approve only once she could explain every line to your face. It takes her forty minutes. She’s proud of those forty minutes — they’re the whole reason she’s trusted.

The second one opens a diff that is 1,900 lines long, generated by an agent overnight, touching eleven files she has never read. Reading it the first engineer’s way would take her most of a day, and tomorrow there will be three more like it. So she doesn’t read it that way. She checks that the change is covered by a test suite she wrote and trusts. She runs it against a set of evaluation cases she built for exactly this kind of change. She reads the parts a machine can’t be trusted on — the boundaries, the failure paths, the place where this code touches money — and she reads the test results everywhere else. Then she clicks approve.

Both engineers signed their name to that code. Both are now accountable for it in exactly the same way: if it breaks production at 3 a.m., it’s their pager, their name in the incident review, their judgment on the line. The norm of code review has never been ambiguous about this — when you approve a change, you’re signing off on every line as if you had written it yourself (Maxim Gorin, “Code Review as a Team Process”).

Here’s the part nobody says out loud: only one of those two engineers is still doing the job that the next five years will pay for. And it’s not the one who read every line.

If you’re an engineer — or you’re learning to become one, or you’re a few years in and quietly worried the thing you’re good at is about to stop mattering — this is for you. Not because coding is “dead.” Because the center of the craft is moving, and the person who keeps practicing the old skill harder will lose to the person who notices what the new one actually is.

The belief that’s quietly become a liability: that reading every line is the job

Ask any good engineer what makes them good, and somewhere in the answer is a version of I understand the code I ship. Not “I trust the tests.” Not “the CI was green.” I read it, I understood it, I would have written it the same way, and that’s why you can trust me with it.

This belief is not vanity. It’s the discipline that separates an engineer from someone who copies a snippet off the internet and prays. Reading the code is how you catch the off-by-one, the unhandled null, the subtle thing the author didn’t see. For thirty years, “I read every line” was the proof of competence. It was the thing you could honestly say that the junior couldn’t.

And it scaled, because the supply of code was bounded by how fast humans could type it. A senior engineer could plausibly read everything that mattered, because everything that mattered was being written by a small number of people typing at human speed.

That bound is gone.

At Google, 75% of all new code is now generated by AI and then approved by an engineer — up from 50% a few months earlier, and from “well over 30%” a year before that (Sundar Pichai, Google Cloud Next 2026, via Fast Company; blog.google). Satya Nadella put Microsoft’s share at 20–30% over a year ago, and said internal repos were already higher (CNBC, April 2025). Google’s own framing of what changed is the tell: engineers are “increasingly taking on review and oversight roles rather than writing code directly” (Google Cloud Next 2026 coverage, DevOps.com).

Sit with the arithmetic for a second. If three-quarters of the code crossing your desk was produced by a machine in a fraction of the time it would take you to read it carefully, then “read every line the way I was taught” is no longer a discipline. It’s a bottleneck — and increasingly, a fiction. Nobody is reading every line of three-quarters of Google’s code the way our first engineer read her diff. They can’t. The hours don’t exist.

So the belief quietly inverted on us. The thing that used to be the proof of diligence — I personally read and understood all of it — has become the thing you can no longer honestly claim. And claiming it anyway, clicking approve on 1,900 machine-written lines while telling yourself and your team that you reviewed them the old way, isn’t diligence. It’s the liability. You’ve taken on the accountability without doing the thing that used to back it.

The crack: careful reading was already failing, before the volume even hit

You could argue this is temporary — that the volume is high now, but reading-every-line is still the gold standard, and we just need better tools to help us read faster.

Except the careful-reading model was already cracking on its own merits, independent of volume. Look at what happens when engineers do try to verify AI output the old way, line by line.

The 2025 Stack Overflow Developer Survey asked 49,000 developers across 177 countries (Stack Overflow, July 2025). Use of AI tools is near-universal: 84% use them or plan to. But trust in what those tools produce went down, hard — 46% now say they don’t trust the accuracy of AI output, up from 31% the year before. More developers actively distrust it than trust it.

And the reason is precise, not vague. The number-one frustration, named by 45% of developers, is AI code that is “almost right, but not quite” — and 66% say they’re spending more time fixing that almost-right code than they’d spend without it (Stack Overflow Blog, 2025 survey results; VentureBeat).

“Almost right, but not quite” is the most dangerous failure mode there is for a line-by-line reader. A wrong answer is easy — you catch it and throw it out. An almost-right answer is plausible. It reads cleanly. It passes your eye precisely because the machine is good at producing code that looks like what a competent human would write. The bug isn’t in the line you’re reading; it’s in the assumption three files away that the confident, fluent code quietly made.

This is what Andrej Karpathy means when he says the human is now the bottleneck. The AI generates fast; the slow, hard, error-prone step is a human deciding whether the output is correct. His advice — “keep AI on a leash,” work in small verifiable chunks, narrow the scope so a human can actually check it — is not a productivity tip. It’s an admission that unaided human verification doesn’t scale, and that the answer is to change the shape of the task so verification becomes possible, not to read faster (Andrej Karpathy, Y Combinator AI Startup School, June 2025, via TechTimes).

So the crack isn’t only that there’s too much code to read. It’s that reading was never going to be enough against an adversary this fluent. The machine produces output specifically optimized to survive a human’s read. Beating it with a more careful read is a losing race.

The turn: the job was never “read every line.” It was “be able to vouch for the result.”

Here’s the uncomfortable reframe.

“Read every line” was always a proxy. It was the cheapest available way to do the actual job, which was: be able to stand behind the result. To be the person whose name on the approval means something — who can say, and mean it, “I understand the consequences of shipping this,” and be right often enough that the team relies on it.

Reading every line was how you earned that confidence when you, a human, were the only available checker and the code came at human speed. The reading was never the point. The standing behind it was the point. We just fused the two so tightly, for so long, that we mistook the method for the job.

That fusion is now coming apart — and noticing it is the whole game. The volume broke the proxy. The almost-right failure mode broke the proxy. But the underlying job didn’t change at all. You still have to be able to vouch for what ships. What changed is that reading is no longer the tool that gets you there. Something else is.

The new craft — the actual new job description — is building the things that let you stand behind output you didn’t write and can’t fully read: the test suites, the evaluation sets, the harnesses that constrain what the machine is allowed to do, and at the high end, the proofs. The skill is no longer “I read it.” It’s “I built the thing that checks it, I understand where that thing is trustworthy and where it isn’t, and that’s why you can rely on my approval.”

If that sounds like a smaller job than writing code, look more carefully. It’s a harder one.

The proof that this is the real craft, not a coping mechanism

It would be easy to dismiss “I trust my tests” as the lazy engineer’s excuse. So here’s the case that building verification is the genuine senior skill — escalating from the everyday to the frontier.

Start with evals, because the people building the models say this out loud. When Anthropic writes about shipping AI agents, their framing is blunt: the qualities that make agents useful — autonomy, flexibility — “also make them harder to evaluate,” and manual human review alone can’t assess them. The mechanism they point to instead is evaluations: automated checks that “make problems and behavioral changes visible before they affect users,” giving you confidence in behavior you cannot fully inspect by hand (Anthropic, “Demystifying evals for AI agents,” January 2026). Read that again with your own work in mind. The recommended way to trust a system whose output you can’t fully read is not to read harder. It’s to build the evaluator that reads for you, across thousands of cases, at a scale your eyes never could.

Writing a good eval is not a lesser skill than writing the code. It’s arguably the deeper one. To build an evaluation set for a piece of behavior, you have to know what correct actually means — every edge case, every quiet assumption, every way “almost right” hides. The engineer who can specify that has understood the problem more completely than the one who can merely produce a plausible solution to it. That’s why this isn’t coping. The eval is the modern form of the same judgment that used to live in the careful read — just externalized into something that scales and runs every time, instead of living in one tired human’s head for forty minutes once.

Then there’s the deliberate narrowing — the harness. Karpathy’s “keep it on a leash” is the everyday version: don’t hand the machine an unbounded task and then squint at 1,900 lines, because you’ll lose. Hand it a bounded one, inside guardrails you built, where the output is small enough and shaped enough that you can verify it (Karpathy, via TechTimes). The craft here is designing the cage: the types, the contracts, the test that fails loudly, the structure that makes a whole class of bad output impossible rather than merely catchable. You’re not reviewing the output anymore. You’re engineering the conditions under which the output can be trusted.

At the frontier, this becomes literal mathematical proof — and it already runs in production at scale. AWS formally verifies s2n, the open-source TLS implementation that secures traffic across Amazon’s services. The point isn’t a one-time audit; it’s continuous verification — at each change to the code, the correctness proofs are automatically re-established “with little to no interaction from the developers” (AWS, “Continuous Formal Verification of Amazon s2n,” CAV 2018). AWS now runs automated reasoning behind Config, Inspector, GuardDuty, S3, and more (AWS, “Formal Reasoning About the Security of Amazon Web Services”). Nobody at AWS is reading every line of the TLS stack on every commit. They built a system that proves the properties they care about and re-proves them automatically — the purest possible version of standing behind code without personally re-reading it.

Evals, harnesses, proofs — they form a single ladder. They are different rungs of the one craft that’s replacing line-by-line reading: building the apparatus that lets you vouch for output at a scale and speed no human reader can match.

And there’s a name for why this works at all. The AI researcher Jason Wei calls it the asymmetry of verification: some tasks are far easier to check than to solve — a finished website takes years to build and seconds to confirm it works. His “verifier’s rule” follows directly: the ease of getting an AI to reliably do a task is proportional to how verifiable that task is (Jason Wei, “Asymmetry of verification and verifier’s law”). That principle governs which tasks AI conquers first — but flip it toward your own career and it reads as a job posting. If the machine’s reach is set by how well a task can be verified, then the person who builds the verification is the person who decides how far the machine is allowed to go, and who gets to trust the result.

The verifier isn’t the loser in this story. The verifier holds the gate.

This migration already happened to another profession — and it tells you how it goes

If you want to know what your job becomes, look at the one career that already lived through this exact shift, decades ago: the airline pilot.

A modern captain spends almost no time hand-flying the aircraft. Automation flies it. The pilot’s role moved from being the active controller of the machine to being its supervisor — the human who monitors the system, verifies that the automation is behaving as expected, and stays ready to intervene when it isn’t (Flight Safety Foundation, “Trust but Verify”). The aviation literature even named the failure mode that defines this new job: “automation surprise,” the moment the pilot looks at the system and has to ask what is it doing, why is it doing that, and what will it do next? (“Automation Surprise” in Aviation, ACM CHI 2015).

Notice three things, because all three are about to be true of engineering.

First, the job did not disappear — it got harder to do well and more consequential. We did not stop needing pilots when autopilot arrived. We needed pilots who were excellent at a different and less natural skill: vigilant supervision of a system they didn’t manually operate.

Second, the new skill has its own failure modes that the old skill didn’t. The aviation research is candid that moving from active flying to passive monitoring causes skill degradation and reduced situational awareness — the supervisor can drift, trust too much, and miss the surprise. The engineer who clicks approve on the green checkmark without understanding what the check does, and doesn’t, cover is having an automation surprise waiting to happen. The defense is the same as in the cockpit: build the instruments, understand exactly what they verify, and keep your hand close to the controls on the parts that matter most.

Third — and this is the one to hold onto — the pilot is still completely, personally accountable. Automation flies the plane; the captain is still responsible for the flight. That’s the shape of the deal in engineering too. AI writes the line; you sign for it. Responsibility for the final code rests with the human who decided to ship it — AI executes, but it cannot hold liability, and every line of AI-generated code still requires a human’s sign-off (niksilver.com, “Accountability, responsibility and reviewing code”).

So the migration isn’t from “valuable” to “obsolete.” It’s from operator to verifier — a job that is more accountable, not less, and that lives or dies on whether you build good instruments and actually understand them.
A landscape diagram of the operator-to-verifier shift drawn as a rising ladder. On the left, the old job — 'Operator: read every line, write the code by hand' — bounded by human typing speed. An oxblood arrow climbs through rungs of the new craft: tests that pin a behavior, evals that score a change across many cases, harnesses that cage what the machine can do, and at the top, formal proofs that re-establish correctness on every change. On the right, the new job — 'Verifier: build the instruments, hold the gate, sign for the result' — with the note that accountability stays with the human throughout. The diagram shows the same judgment that lived in the careful read externalized into apparatus that scales.
Figure 01The same accountability, a new craft. The job migrates from operator — reading and typing every line at human speed — to verifier: building the evals, harnesses, and proofs that vouch for output no human can re-read, while the sign-off stays human.

What this means for you, concretely

Go back to the two engineers from that Tuesday.

The first one isn’t bad at her job. She’s excellent at a skill whose price is falling — reading and producing code by hand, line by careful line. The second one looks, from the outside, like she’s doing less. She read fewer lines. She typed almost nothing. But she spent her real effort somewhere the first engineer didn’t: on the test suite that catches the regression, the eval set that pins down what “correct” means for this change, the boundary she insisted on reading by hand because she knows exactly where her instruments are blind. When she clicks approve, her name means something — not because she read every line, but because she built the thing that did, and she knows precisely how far to trust it.

That’s the job opening. Not “prompt engineer.” Not “AI whisperer.” Verifier — the engineer who can stand behind output they didn’t write, because they built and understand the apparatus that vouches for it.

If you’re already an engineer, the move is to stop measuring yourself by lines read and authored, and start asking a different question of your own work: what did I build today that lets someone trust an output without re-reading it? A test that pins a behavior. An eval that scores a change across a hundred cases. A type or a contract that makes a whole category of bug impossible. A guardrail that keeps the machine inside a space you can actually check. That’s the work that’s appreciating. Get deliberately good at it. Learn what evals are and write some badly, then less badly. Read how the teams who do verification at scale — the eval writers, the formal-methods people — actually think, because their once-niche skill is becoming the center of the field.

And if you’re earlier — deciding whether it’s even worth learning to code, or a year or two in and unsure the bet still pays — here’s the honest version. The half of the job that was “type the code a competent person would type” is the half getting cheap, and yes, it’s getting cheap fast. But the half underneath it never went anywhere: knowing what correct looks like well enough to prove it, deciding what to build, and being the person who can responsibly say “ship it.” Learn the craft for that — for the judgment, the verification, the standing-behind — and you’re walking toward the part of the field that is hiring more verifiers every quarter, not fewer.

The first engineer spent forty minutes proving she’d read every line. It was honest, careful work. It just stopped being the thing the job is actually for. The job was always to be the person you can trust to sign. The line-by-line reading was only ever how we used to get there — back when our own eyes were the best instrument we had. They aren’t anymore. The engineers who win the next five years are the ones building the better instruments, and learning, exactly, where to trust them.

Sources

  1. 1
    Sundar Pichai / Google — 75% of new code AI-generated, engineers shifting to review and oversight (Google Cloud Next 2026): Fast Company · blog.google · DevOps.com.
  2. 2
    Satya Nadella / Microsoft — up to 30% of code written by AI (CNBC, April 2025): CNBC.
  3. 3
    Stack Overflow 2025 Developer Survey — 84% AI use, 46% distrust accuracy, 45% “almost right but not quite,” 66% spend more time fixing it (49,000 respondents, 177 countries, July 2025): Press release · Results blog · VentureBeat.
  4. 4
    Andrej Karpathy — human as the verification bottleneck, “keep AI on a leash” (Y Combinator AI Startup School, June 2025): TechTimes.
  5. 5
    Anthropic — evals as the way to gain confidence in agent behavior manual review can’t assess (“Demystifying evals for AI agents,” January 2026): Anthropic Engineering.
  6. 6
    Jason Wei — asymmetry of verification and verifier’s law: jasonwei.net.
  7. 7
    AWS — continuous formal verification of s2n TLS; automated reasoning across AWS services: Continuous Formal Verification of Amazon s2n (CAV 2018) · Formal Reasoning About the Security of AWS.
  8. 8
    Aviation — pilot’s shift from operator to system supervisor; “automation surprise”: Flight Safety Foundation, “Trust but Verify” · “Automation Surprise” in Aviation, ACM CHI 2015.
  9. 9
    Code-review accountability — approving a change means signing off as if you wrote it; AI doesn’t remove accountability: Maxim Gorin, “Code Review as a Team Process” · niksilver.com, “Accountability, responsibility and reviewing code”.