The Augmented Work.
Article № 20 · Hiring

AI Should Be Allowed in Interviews.

Banning AI in interviews tests memorization. Allowing it tests thinking. For hiring managers and engineering leads tired of the cheating arms race — here’s what an AI-allowed interview reveals, and how to run one.

Issue June 2026
Read time 6 minutes
Filed under Hiring · AI · Engineering Leadership
Length 1,500 words
AI Should Be Allowed in Interviews
In brief

Let candidates use AI in your interviews. Not as a reluctant concession to a problem you can’t police — as the deliberate design choice that finally lets you measure the thing the job actually requires.

Here’s the logic in one line: banning AI tests memorization; allowing it tests thinking. A closed-book coding round tells you whether someone can recall an algorithm under stress. An AI-allowed round — hand them a real problem, give them the tool, watch them work for 45 minutes — tells you how they’ll actually do the job, because the job now is that. The bar doesn’t drop. It moves from recall speed to judgment quality. That’s a better bar, and it produces better hires.

This is for the people who design and run interviews — hiring managers, engineering leads, technical interviewers — who are currently banning AI to “test real skills” while quietly losing an arms race. If you don’t control your interview format, or you’re a candidate looking for tips, this isn’t for you.

A two-column diagram. The left column, 'BAN AI — the closed-book round,' drawn in a dashed grey oval, lists what you measure: memorization, recall under pressure, stress tolerance — and little that the job needs. The right column, 'ALLOW AI — watch them work,' drawn in a solid oxblood oval, lists what you finally see: problem-solving fundamentals, prompting and evaluating output, communicating their thinking, learning velocity, and teachability. Footer: the bar doesn't drop — it moves from recall speed to judgment quality.
Figure 01Same 45 minutes, two different tests. Ban the tool and you measure how well someone performs without it; allow it and you measure how well they work with it — which is the actual job now.

It’s a driving test that bans the steering wheel

The closed-book technical interview is a driving test that bans the steering wheel. You can find out whether someone memorized the highway code. You will never find out whether they can actually drive. Remove the ban, and the real signal appears.

The 5 things you can finally see

When the tool is in the room, five job-relevant signals become visible — none of which a closed-book round can show you.

  1. Problem-solving fundamentals. AI handles syntax; it doesn’t decide what to build. Watch whether the candidate frames a messy problem before touching the keyboard, or just starts firing prompts. The decomposition is the skill — and it’s the part the tool can’t do for them.
  2. Prompting and evaluation of AI output. This is the actual core competency now, and it’s where the best signal lives. In the BCG field experiment, consultants working inside AI’s “jagged frontier” produced 40% higher-quality work — but on tasks outside it, those who trusted the AI scored 19% worse than those without it. The hire you want is the one who catches the wrong answer. Seed your problem with a subtle trap and see if they spot it.
  3. Communication of thinking. When a candidate narrates why they’re asking the AI what they’re asking, you hear their reasoning in real time. A closed-book whiteboard gives you the opposite: a silent person sweating, while you guess at what’s happening in their head.
  4. Learning velocity. Hand them an unfamiliar API or framework and watch how fast they get productive. This is where less-experienced people often shine — the Copilot enterprise study found junior developers adopted the tool fastest and gained the most from it. You can’t see velocity in a recall test; you can only see it in motion.
  5. Teachability. Nudge them mid-problem — “what if the input doubled?” — and watch whether they incorporate it. The ability to take a steer and run with it is the best predictor you have of whether someone grows on your team, and it’s visible only while they work.

How to run one

Five steps turn the theory into a round you can run next week.

A five-step vertical playbook titled 'How to run an AI-allowed interview,' subtitled 'Test for the job — with the tools they'll actually use to do it.' Step 1: Pick a real problem, not a puzzle — a ticket from your backlog; LeetCode is now AI-solvable in seconds. Step 2: Put the tool in the room — the same AI they'd use on the job. Step 3 (highlighted in oxblood): Seed a trap — a subtle flaw the AI will miss; catching it is the skill that matters most. Step 4: Watch and probe, don't quiz — ask 'why that prompt?' and 'how do you know it's right?' Step 5: Score thinking, not output — an understood half-solution beats a copy-pasted whole one. Footer: 45 minutes of watching someone work tells you more than any closed-book round ever has.
Figure 02The whole round on one page. Step 3 is the one in oxblood for a reason — the seeded trap is where the single most valuable modern skill, catching the AI’s confident mistake, becomes visible.
  1. Pick a real problem, not a puzzle. Use something that resembles an actual ticket from your backlog. LeetCode-style questions are Google-able and now instantly AI-solvable — they test nothing once the tool is allowed, which is the point.
  2. Put the tool in the room. The same one they’d use on the job — Copilot, ChatGPT, Claude, whatever your team runs. Meta has been piloting exactly this with CoderPad: AI assistants built directly into the interview environment, so the round mirrors the actual workflow.
  3. Seed a trap. Build in a subtle flaw the AI is likely to reproduce or wave past — an edge case, a wrong assumption, a deprecated method. The candidate who catches it has the single most valuable skill in modern knowledge work.
  4. Watch and probe — don’t quiz. Your job shifts from interrogator to observer. Ask “why did you ask it that?” and “how do you know that’s right?” Interrogation is cheap; investigation is what actually tells you something.
  5. Score thinking, not output. A working solution the candidate doesn’t understand is a worse signal than an unfinished one they clearly reasoned through. Grade the process.
What good looks like

They frame the problem first, prompt deliberately, read the output skeptically, catch the trap, and tell you what they’d verify before shipping.

What weak looks like

Paste prompt, paste output, declare done.

Why the closed-book interview was already broken

Here’s the uncomfortable part: closed-book technical interviews were a poor signal long before AI arrived.

The largest recalibration of selection science this century — a 2022 meta-analysis by Paul Sackett and colleagues, published in Industrial and Organizational Psychology — found the methods we lean on hardest are weaker predictors than we believed. Cognitive-ability tests came in at ρ=0.31 and work-sample tests at ρ=0.33, well below the structured interview at ρ=0.42. The abstract algorithm quiz sits closer to the bottom of the validity table than the top.

It gets worse for the whiteboard specifically. In a controlled study presented at ESEC/FSE 2020, engineers solved problems privately and while being watched. Simply being observed cut their success rate by more than half. In the public setting, no women solved the problem; in private, all of them did. The whiteboard interview was largely measuring stress tolerance — and filtering by gender as a side effect.

So interview skill and job skill decoupled years ago. The people who pass closed-book rounds aren’t necessarily better at the work — they’re better at the round. We kept running it anyway because interrogation is cheap and investigation is expensive.

We keep running closed-book interviews because interrogation is cheap and investigation is expensive: it takes two minutes to ask “do you know X?” and forty-five to watch someone actually think.

“But isn’t this just letting them cheat?”

Candidates cheat because the test is broken. When roughly 80% of candidates use an LLM during top-of-funnel coding assessments even when explicitly told not to — a figure reported by interview platform Karat — you don’t have a cheating problem, you have a measurement problem. A whole covert-tooling industry exists to feed answers without tripping screen-share alerts (one such tool was built by a Columbia student in 2025) precisely because the test rewards what it claims to forbid. Fix the test and the incentive evaporates: you can’t cheat at a test that asks you to use the tool.

The alternative companies are reaching for is worse. A Gartner survey found 72.4% of recruiting leaders dragging interviews back in-person to stop AI cheating, with Google, Cisco, and McKinsey reinstating physical rounds. That’s slower, costlier, and still measures the wrong thing — now just in a room.

And no, allowing AI doesn’t lower the bar. It relocates it. The old bar was “can you recall this under pressure.” The new one is “can you direct the tool, catch its mistakes, and ship something correct.” Given that 75% of knowledge workers and 76% of developers already work this way, that’s not a lower bar. It’s the real one.

The bottom line

The next time you design a round, stop testing whether a candidate can do the job without the tools they’ll use every single day on the job. Hand them a real problem, give them the AI, seed one trap, and watch.

You’ll learn more in 45 minutes than any closed-book LeetCode round has ever told you — and you’ll stop screening out the people who are best at the work in favor of the people who are best at the interview.

One thing to remember: you’re not lowering the bar. You’re finally measuring against the right one.

Sources

  1. 1
    Microsoft & LinkedIn2024 Work Trend Index Annual Report (May 8, 2024) — AI adoption among knowledge workers: 75% usage, 78% bring-your-own-AI.
  2. 2
    Stack Overflow2024 Developer Survey, 60,000+ respondents (July 22, 2024) — 76% of developers using or planning to use AI; active use rose 44%→62% year over year.
  3. 3
    Sackett, Zhang, Berry & LievensIndustrial and Organizational Psychology (Cambridge, 2022) — recalibrated selection-method validity: structured interview ρ=0.42, work sample 0.33, cognitive ability 0.31.
  4. 4
    Behroozi, Shirolkar, Barik & Parnin — ACM ESEC/FSE (November 2020) — whiteboard-stress RCT: being watched cut success by more than half; no women solved the problem in the observed condition.
  5. 5
    Dell’Acqua et al. — BCG “jagged frontier” field experiment, Organization Science / HBS Working Paper 24-013 — +40% quality on-frontier; −19% off-frontier.
  6. 6
    Cui, Demirer, Jaffe, Musolff, Peng & SalzManagement Science (February 2026) — Copilot enterprise RCTs across 4,867 developers: +26% weekly tasks, juniors gain most.
  7. 7
    Karat (March 25, 2025) — roughly 80% of candidates use an LLM during top-of-funnel coding assessments despite instructions not to; the “Interview Coder” covert tool.
  8. 8
    Gartner, via Computerworld (August 26, 2025) — 72.4% of recruiting leaders returning to in-person interviews; Google, Cisco, and McKinsey reinstating physical rounds.
  9. 9
    CoderPad (October 30, 2025) — Meta’s AI-enabled coding interview pilot: AI assistants built into the interview environment.