Shipping More Is How You Stay Junior

The junior everyone wants on the team

Picture the junior every team is glad to have. She picks up a ticket within an hour of it landing. The code is clean — named well, tested, no surprises in review. She closes more tickets than anyone at her level, and she closes them faster. When the spec is vague, she doesn't thrash; she waits, sensibly, for it to firm up, then moves the moment it does. Her manager describes her in one word: reliable. At the end of the quarter her dashboard is a wall of green.

For about fifteen years, that was the whole picture of a great early-career engineer. Do what's assigned. Do it cleanly. Do it fast. Don't break things. The path from junior to senior ran straight through output — write enough good code, for enough years, and seniority arrived as a kind of sediment. The advice everyone gave was the advice everyone had been given: master the stack, ship reliably, and the rest follows.

It doesn't follow anymore. And the reason it doesn't is sitting in the same editor she is.

If you have a few years in and you've quietly wondered whether the thing you're good at still counts the way it used to — this is for you. If you manage someone like her and you can't quite say why your most productive junior isn't your most promotable one — this is for you too. And if you've been mid-level longer than you expected and you can't name the rung you're missing — especially you. The thing nobody put on your rubric is the thing the whole rest of your career is made of.

The crack: the machine got good at exactly the part you were proud of

Here is the uncomfortable part, stated plainly: the work that junior is proud of is the work that just got automated.

Not "will be." Did. As of 2026, AI coding assistants are no longer an experiment at the edge of the workflow — they're the workflow. In the largest developer survey we have, the overwhelming majority of developers now use or plan to use AI coding tools, and roughly half reach for them every day. A meaningful and rising share of new code at major firms is now drafted by a model before a human touches it. The thing she does well — turn a clear, well-specified ticket into clean, correct code — is precisely the thing these tools do well, instantly, without getting bored.

And here is where most takes stop and declare either triumph or doom. Both are wrong, and the reason they're wrong is the whole essay.

Because speed at the visible part isn't even clean. In a controlled trial by METR, experienced developers using AI tools felt about 20% faster — and were measured to be roughly 19% slower, eaten alive by the overhead of reviewing and repairing confident, fluent, almost-right output. Sit with that gap for a second. The work felt more productive and was less productive. The dashboard looked better while the real thing got worse.

That gap — between what looks like work and what is work — is the crack. It opens under the productive junior first, because she's the one who built her whole identity on the part you can see.

The work felt more productive and was less productive. The dashboard looked better while the real thing got worse.

The dashboard looked better while the real thing got worse.

+20% Felt faster

−19% Actually slower

METR RCT, 2025 — experienced developers, real tasks.

If the machine now writes the clean, correct code faster than you can — and the speed it gives you is partly a feeling — then a question you were able to avoid for your whole career so far arrives all at once: when the typing is handled, what exactly are you for?

The turn: the job was never the code

So here is the thing nobody told her, and maybe nobody told you.

The job was never to write the code. The job was to take an ambiguous question and hand back a small, clear answer. The code was just the most visible form that answer took — so visible that we mistook it for the work itself. For fifteen years you could get away with that mistake, because turning the vague into the specific and then writing the specific into code were bundled into one person, one motion, one set of hours. You never had to ask which half was the value. You got paid for the bundle.

AI unbundled it. It took the second half — the writing-the-specific-into-code half — and got very good at it very fast. What it did not take, and shows no sign of taking, is the first half: sitting with a fuzzy, contradictory, half-articulated request and deciding what should actually exist.

We don't have to argue this from intuition; it shows up the moment you measure it. On the saturated coding benchmark the labs used to brag about, top models now score above 80% — and that benchmark got retired precisely because it stopped meaning anything. Point the same frontier models at SWE-bench Pro, which demands multi-step architectural decisions and handles realistic, ambiguous problems, and the score collapses to around 23%. Same models. The only thing that changed is whether the problem was already mapped.

Fig. 02 · The cliff Same models. The map is the only thing that changed.

80%

Mapped & well-specified tasks

SWE-bench Pro (Scale AI, 2025) collapses scores ≈23% the moment problems carry realistic, multi-step ambiguity. The same frontier models score 80%+ on the mapped benchmark.

Picture it as fog rolling across the team's work. AI is a floodlight. Inside the lit circle — the well-specified, the already-decided, the mapped — it is blinding, instant, tireless. It will illuminate that ground faster and more thoroughly than you ever could. But the floodlight does not move. It lights up what's already been pointed at, and nothing past the edge. Seniority is simply how far past that edge you can walk and still move with purpose. The code AI writes is all inside the circle. The work that's left — the only work that's now scarce — is at the dark edge, where someone has to decide where to point next.

Fig. 03 · The lit circle Drag the field. The light won't grow.

Add export to CSV Fix retry on 429 Rename fetchUser Bump axios → 1.7 Add Sentry to webhook "Make dashboard faster" "We need rate limiting" "Improve onboarding" ? ? ? ? ?

Seniority is how far past the edge you can walk and still move with purpose.

Which leads to the sentence that should make the productive junior slightly sick: shipping more, faster, is now often how you stay junior. Not because output is bad — because output is the part that's no longer scarce, and pouring your hours into the no-longer-scarce thing is how you become, yourself, the cheap and abundant resource.

The proof starts here: every ticket is a stub

Look at any ticket — really look at it. "Add export to CSV." "Users are complaining about the dashboard, make it faster." "We need rate limiting on the API." Every one of these is a stub. It's the compressed, lossy summary of a decision someone hasn't fully made yet. Export which fields, in whose timezone, for the finance team or the end user? Faster how — perceived load, actual query time, the one report that times out? Rate-limit by what, for whom, failing how?

The ticket is never the work. The work is the question behind the ticket. It always was — but AI made that brutally legible, because the moment a ticket is fully specified, a model can satisfy it in seconds. The instant the stub is filled in, the value drains out of filling it. So all the remaining value rushes upstream, into the act of filling the stub in — of turning "make it faster" into a precise, defensible statement of what "faster" means and which tradeoff we're accepting to get it.

A junior who answers that question — who turns the stub into a clear spec before a line is written — is doing a senior's job in miniature. It doesn't matter that the task is small. The motion is the entire game: ambiguous in, clear out. Do that on a tiny ticket and you have demonstrated, in fifteen minutes, the exact thing the next three levels of the ladder are made of.

Fig. 04 · 9:02 AM, Tuesday A vague ticket just landed. What's your first move?

TKT-2847 Users are complaining the dashboard is slow — make it faster. — filed by PM, 4 min ago

The ladder was always made of fog

Because that's what the ladder is. We talk about engineering levels as if they measure skill or output or years, but read what the rubrics actually say and a single axis runs straight up the middle of every one: how much ambiguity can you absorb without someone above you absorbing it first.

This isn't a metaphor I'm imposing. It's written down. Dropbox's public career framework expects an entry-level engineer to execute well-scoped tasks with clear guidance — the ambiguity has been removed for them by someone more senior. The senior level expects you to independently identify the right solutions to ambiguous, open-ended problems. By staff level you're defining both the what and the how across multi-year horizons. Princeton's research-software-engineer ladder names the axes outright: autonomy, scope, task complexity. Every level is the same job — convert vague into clear — performed at a larger radius of fog.

Fig. 05 · One axis runs up the middle Every level: convert vague into clear, at a larger radius.

Staff "Define both the what and the how across multi-year horizons." — composite of public IC frameworks

Senior "Independently identify the right solutions to ambiguous, open-ended problems." — Dropbox engineering career framework

Entry "Execute well-scoped tasks with clear guidance." — Dropbox engineering career framework

you are here AI stands here now

Princeton's RSE ladder names the same three axes outright: autonomy, scope, task complexity.

Which means each level above you exists, in large part, to absorb ambiguity so the level below can move. Your senior turns your manager's foggy priority into three concrete tickets. Your staff engineer turns "we need to be more reliable" into an architecture. That's not overhead on top of the "real" engineering — that is the engineering, and the code is its precipitate.

AI didn't change the shape of this ladder. It did something more pointed: it kicked out the bottom rung. The rung made of "I write correct code from a clear spec" — the rung the productive junior is standing on — is the one rung a model can now stand on too. Every rung above it, the fog-clearing rungs, AI left completely intact. So the ladder didn't get shorter. The first step just disappeared, and the engineers still standing on it are wondering why the climb feels different.

The cheap clarification beats the expensive correction — more than ever

Here's the tactic that follows, and it's almost insultingly small.

Before you write a line — before you let the model write a line — write down what you think you're being asked. One paragraph. "Here's what I understand this ticket to mean, here's the tradeoff I'm assuming, here's what I'm explicitly not doing. Shout if I've got it wrong." Send it to whoever owns the question. Then wait the fifteen minutes it takes them to reply.

This feels like a detour when you could be coding. It is the opposite. The cost of a misunderstanding doesn't stay flat as it travels — it compounds. Barry Boehm's foundational data, replicated for decades, puts the curve at roughly 1:10:100: a misunderstanding caught at the design stage costs X to fix, the same one caught in testing costs ten times that, and one that reaches production costs a hundred. Requirements defects are not a rounding error in this; research has attributed around half of downstream test failures to them. The paragraph you write before coding is the cheapest possible point on that curve. The merge that goes the wrong way is several of the expensive ones.

Fig. 06 · 1 : 10 : 100 A wrong assumption gets a hundred times more expensive the later it's caught.

Design1× Implementation~3× Testing10× Production100×

Caught at Design cost ≈ $100

Curve shape after Boehm & Basili (2001). Dollar figures are illustrative — the 1:10:100 ratio is the cited claim.

AI bent this curve the wrong way. When the direction is wrong, a model doesn't slow you down — it speeds you confidently off the cliff, generating fluent, plausible, wrong-direction code faster than you can read it, let alone check it. The thing that used to limit the blast radius of a misunderstanding — how much wrong code a human could physically write before someone noticed — is gone. Worse, the tools won't save you here, because they won't even flag the ambiguity. On the Ambig-SWE benchmark, which tests models on under-specified tasks, the agents almost never stop to ask a clarifying question on their own; they barrel ahead and assume. When researchers forced the interaction — let the model ask — success on the ambiguous tasks jumped by up to 74%. Read that twice: the single biggest lever on whether the work comes out right is whether someone asks the question first. The model won't. That someone is you.

The model won't ask the question. That's not a gap in the tool. That's the job opening.

The work that gets you promoted is invisible on the dashboard

And now the part that makes all of this genuinely hard, the reason smart people keep optimizing for the wrong thing even after they suspect it's wrong.

Everything I've just described — the clarifying paragraph, the stub turned into a spec, the wrong project you talked the team out of building — is invisible on every metric you're measured by. Tickets closed can't see it. Pull requests merged can't see it. Lines shipped, commits, story points — none of them can register the three days of wrong-direction work that never happened because you spent fifteen minutes asking what "faster" meant. The single most valuable thing you did all month leaves no trace on the dashboard. By the numbers, the engineer who prevents the disaster and the engineer who was on vacation look identical.

Fig. 07 · Two engineers, same row of green Promote one. From the dashboard alone.

Dev A Q2 · senior IC

Tickets closed42

PRs merged38

Lines shipped4,210

Story points63

Real value layer

Talked the team out of a 3-week wrong build before the kick-off.
Turned four vague tickets into specs the model could actually satisfy.
Asked the question that stopped a duplicate-payments bug from shipping.
Caught a wrong-direction migration during a 12-min spec review.

Dev B Q2 · senior IC

Tickets closed42

PRs merged38

Lines shipped4,210

Story points63

Real value layer

— no record —

The metric row doesn't change. The company's instruments physically can't tell A from B.

This isn't a fringe complaint; it's close to consensus among the people who study engineering work most seriously. When McKinsey proposed measuring individual developer output, Kent Beck and Gergely Orosz published a widely-read rebuttal walking through why it backfires — you trigger Goodhart's Law (the moment a measure becomes a target, it stops measuring anything), and you punish exactly the collaborative, ambiguity-reducing "glue work" that holds a system together but shows up nowhere in the activity log. The SPACE framework, built by researchers at GitHub and Microsoft Research, opens by warning organizations not to evaluate people on lines of code or commit counts. Delete five hundred lines of dangerous redundant code and you've done real work; the dashboard records it as negative productivity.

Here's the trap, fully assembled. The visible metrics reward the abundant thing (output) and are blind to the scarce thing (judgment under ambiguity). AI just made output more abundant and more visible than ever — so the dashboard looks better and better while the actual gap between the productive junior and the promotable one quietly widens. She is being congratulated, in real numbers, for getting better at the one thing that no longer needs a human.

Now, the honest caveat, because you deserve it: there's no clean dataset proving "ambiguity-reducers get promoted faster" as a measured outcome. What exists is every published career ladder describing seniority as ambiguity-resolution, and a strong practitioner consensus that the metrics miss it. But notice that's not a weakness in the argument — it's the argument. The thing isn't measured because it resists measurement, which is the very reason it stays scarce and valuable while the measurable thing gets automated and cheap. If a dashboard could capture it, a model could optimize it, and we'd be right back where we started.

What to do Monday, one light at a time

So what does this actually look like on a Tuesday, for someone who isn't a staff engineer and can't redesign the org chart? It's smaller than you'd think, and that's the good news.

It looks like the paragraph before the code — the one we already talked about — sent without being asked. It looks like a pull request comment that says why, not just what: not "changed the retry logic" but "changed the retry logic because the old version would have silently dropped duplicate payments under load — here's the case I'm worried about." It looks like, when a vague ticket lands, replying within the hour with "here's what I think you're actually asking — is that right?" instead of waiting for it to firm up. None of these take more than fifteen minutes. None of them show up on the dashboard. Every one of them is you, taking one paragraph of fog off the desk above you before anyone asked you to.

That last part is the whole move, so let me say it directly. You don't reduce ambiguity by being assigned to. You reduce it by reaching up — by noticing that the question one level above you is still fuzzy and quietly making it less fuzzy, unprompted. Do it once and your senior notices. Do it for two months and you get pulled into the design discussion before the tickets are written — which is to say, you get invited to the place where the ambiguity actually lives. That invitation is the promotion, arriving months before the title does. The title is just paperwork that catches up to a thing everyone already saw.

Go back to the fog one last time. The floodlight is on, and it is dazzling, and it is never going to move. Most people will stand inside the lit circle for their whole careers, getting faster and faster at illuminating ground that's already lit, wondering why the work feels less and less like theirs. The machine is better at that than they are now, and the gap only grows.

The junior who matters is the one who takes a step toward the dark edge — not a heroic leap, just one pace past the light — and reaches over to flip on a switch nobody handed her. She's not the most productive person on the dashboard. She might be near the bottom of it. But she's the only one demonstrating the single thing the next fifteen years of her career are made of, the one thing the floodlight can't do: deciding, in the dark, where to point next.

The code was never the job. AI just made that impossible to keep pretending. The question behind the ticket was always the work — and now it's the only work that's still yours. Pick one foggy thing this week that nobody asked you to clarify, and clarify it. That's the entire move. Everything else is just doing it again, at a larger radius, for the rest of your career.

Sources

METR randomized controlled trial — experienced developers measured ~19% slower with AI tools despite feeling ~20% faster. METR, 2025.
SWE-bench Pro — Scale AI, 2025. Frontier models score 80%+ on saturated benchmarks but collapse to ~23% on tasks demanding multi-step architectural decisions and realistic ambiguity.
Ambig-SWE — interactive benchmark, 2025. Coding agents rarely seek clarification on under-specified tasks; enabling clarification raises success by up to 74%.
AI coding-tool adoption — Stack Overflow Developer Survey and JetBrains State of the Developer Ecosystem, 2025–2026.
Engineering career ladders — Dropbox public career framework; Princeton research-software-engineer ladder (autonomy, scope, task complexity).
1:10:100 defect cost — Boehm & Basili, "Software Defect Reduction Top 10 List," IEEE Computer, 2001.
Critique of individual output metrics — Kent Beck & Gergely Orosz, rebuttal of the McKinsey developer-productivity framework; SPACE framework, GitHub & Microsoft Research.