The Augmented Work.
An editorial on AI & the future of work
Essay № 02 · Essays

Five AI Projects.

The projects on your CV are the same projects on everyone else’s CV. The market changed; the bar moved. This essay is about what to ship instead — and how to tell the difference between a tutorial and a hire.

Issue May 2026
Read time 13 minutes
Filed under AI · Careers · Portfolio
Length 4,600 words
In brief

The projects on your CV are the same projects on everyone else's CV. Same RAG demo on the same PDFs. Same Kaggle classifier. Same Streamlit dashboard. Same Titanic dataset, same MNIST, same fine-tuning notebook copy-pasted from a Medium article. Your GitHub looks like everyone else's GitHub. Your "final year project" looks like the one three years before yours, and the one three years after.

Recruiters see two hundred CVs a week that all look like yours.

Figure 01 — The sameness problem
Two hundred CVs a week. The recruiter's mental image looks like this.
~0
CVs per week, per recruiter, in a single posting
Click "If they shipped what's in this article" — one card changes. That is the entire premise of this essay.

The market changed. The bar moved. The projects that got someone hired in 2021 won't get you a callback in 2026. This essay is about what to ship instead — five projects that signal you've done the work nobody around you bothered to do.

Most students stop at the first phase of a project — sometimes from inexperience, sometimes from laziness — right when the problems start to become interesting. Some teachers and alumni will tell you that's fine, that juniors should focus on junior-level tasks and let the senior work come with time. We disagree, and the labour data does too.

Figure 02 — What changed under your feet

Entry-level software roles have not slowed. They have collapsed.

−0%
Employment, ages 22–25 in software dev, vs. its late-2022 peak.
−0%
Entry-level job postings over the same window.
25 50 75 100 2022 2023 2024 2025 2026 Peak · late 2022 Employment 22–25 yrs Job postings entry-level
Source: Yale School of Management, "The real job destruction from AI is hitting before careers can start" — composite chart drawn from reported figures. Read the analysis →

What was once a junior job is now done by AI. What's left for juniors? To say "yes" repeatedly to a model until it finishes the job. Nonsense. Now is the time for juniors to step into what was once considered beyond their level.

A portfolio speaks before you do — recruiters and clients read artifacts, not transcripts. Projects are where you develop taste, and knowing what to build is harder than knowing how to build it. They teach the full lifecycle — shipping, maintaining, deprecating — not just the fun middle part everyone gravitates to.

Treat projects as the artifact. Everything else is a footnote.
Antipatterns

Projects that look like work but won't get you hired.

These are the projects every recruiter has already seen this week. They're not wrong to build — they're wrong to put at the top of a CV. Cross them off; what remains is the brief for the rest of this essay.

I
school projectscandidates from the same school have the same CV
II
course learning projectsthose exist to teach the basics, not prove competence
III
plug-and-play projectswiring tools together is plumbing, not engineering
IV
clone-of-X projects"I built a ChatGPT clone" reads as "I followed a YouTube video"
V
benchmark-chasing projects0.3% on a leaderboard is research, not engineering
VI
demo-only projectsit works on your laptop, it dies on anyone else's
VII
projects without usersif no one but you ever opened it, it's a learning exercise
The standard

What good looks like.

Before the specifics, here is the bar every project below should meet. Score the project you would put on your CV today. Be ruthless — recruiters are.

Figure 03 — Score your own project
The bar every project should meet.
Tap the rows that are true of the project you'd put on your CV today.
0/ 9
solves a real problemnot "a problem I invented to justify the project"
could be used by someone elselevel 1: another developer · level 2: a non-technical person
deployed and used by othersliving somewhere other than your laptop
documented at multiple levels of technicitya recruiter, a teammate, and a senior engineer all leave understanding it
has an opinionyou made non-obvious choices and can defend them
has a failure mode you handledrate limits, bad inputs, downtime, hallucinations
has at least one metric you can quotelatency, accuracy, retention, cost per request
explainable in 60s to a non-engineer, 30m to a seniorand you have actually done both
you'd put your name on it publiclythis is the load-bearing question
Pick the project you'd show off. Score it honestly.
The five projects

The version every student ships — and the version that gets you on a call.

For each project, toggle between the MVP everyone builds and the shipped version that earns an interview. Same five problems. Two very different artifacts.

DocsPDF · MD · HTML Chunksize · overlap Embedmodel Vector DBkNN Query Retrievetop-k Prompt + LLMnaive Answer OFFLINE · INDEX ONLINE · QUERY DocsPDF · MD · HTML Chunksize · overlap Embedmodel Vector DBkNN Query Rewrite / decomposemulti-hop BM25keyword Vectorsemantic Rerank → cite-groundedcross-encoder · citations EVAL SET · 20–50 QUESTIONS · LATENCY TRACKED · COST PER QUERY · FEEDBACK LOOP
What must be thereHow to go beyond
01
Chunking. and a defended reason for the size — not "I picked 512"
02
Embedding. text → vectors
03
Vector database. kNN search
04
Retrieval step. one that you can actually evaluate, not "looks good"
05
Grounded prompt. refuses to hallucinate when nothing matches
06
"No context" handling. most student RAGs confidently make things up here
07
Eval set · 20–50 Qs. with expected answers, so you know if changes help or hurt
+
Reranking. cross-encoder or LLM-based, on top of vector search
+
Hybrid search. semantic + BM25 — proper nouns, IDs, codes survive
+
Real UI. Next.js, Streamlit, or a Slack bot — somewhere non-technical users reach it
+
Multiple sources. PDFs + web + database + APIs
+
Query rewriting / decomposition. multi-hop questions actually work
+
Citations. every answer points back to its source chunk, clickable
+
Conversation memory. follow-ups like "what about the second one?" resolve
+
Structured extraction. tables, dates, amounts — not raw text dumps
+
Feedback loop. thumbs up/down, logged for later analysis
+
Cost & latency tracking. you can quote $/query and ms/query
The one decision that signals you understood the problem You can explain, in one sentence, why retrieval fails on your hardest five queries — and what you'd do about it.
Examples worth shipping
01Your country's legal code, central bank reports, or academic papers in your field
02Your own Obsidian vault and lecture PDFs — bonus: you become its first real user
03A profession's documents: doctors' guidelines, tax law, building codes — niche beats general
04A podcast or YouTube channel's transcripts — multimodal-adjacent, easy to demo
05A company or internship's public documentation — turns into a real interview conversation
06A GitHub repository: code + issues + PRs — useful for explaining unfamiliar codebases
SALMA · FRIDAY · 2H/WEEK step 1 step 2 step 3 step 4 step 5 step 6 step 7 step 8 2 hours, by hand, every Friday. Errors. Re-runs. Late reports. Resentment. MOST STUDENT "AUTOMATIONS" REPLACE SOMETHING NOBODY DID MANUALLY. SALMA · FRIDAY · 2H/WEEK CronFri 09:00 Fetch3 inboxes Parse / mergePDF · XLSX Write sheetconsolidated NotifySlack/WA Auditlog·kill RETRY · PARTIAL FAILURE HANDLING SALMA: 0 MINUTES · STILL USING IT 3 MONTHS LATER
What must be thereHow to go beyond
01
Named user · named task. "Salma spends 2h every Friday consolidating 3 emails into one sheet"
02
Measurable "before". how long it takes manually, how often, how error-prone
03
End-to-end automation. no manual steps in the middle
04
Error handling. what happens when the email doesn't arrive, the format changes, the API returns 500
05
Logs and observability. the user knows it ran, what it did, what it skipped
06
Kill switch. the user can turn it off without calling you
+
Real schedule. cron, GitHub Actions, a small server — not "when I remember"
+
Notification layer. email, Slack, WhatsApp — the user trusts it without checking
+
Partial-failure handling. if 8/10 processed, the user knows which 2 failed and why
+
Configurable without code. a YAML file, a Google Sheet, a small admin UI
+
Audit trail. every run logged, every action reversible where possible
+
Versioning. when you change the logic, old runs are still explainable
+
Real hand-off. walk away for two weeks, see if it still runs
The one decision that signals you understood the problem You can quote the time it saves per week, and the user is still using it three months later.
Examples worth shipping
01An internship task you actually did by hand for two months — you know the edge cases because you lived them
02A family member's small business: a shop, a clinic, a freelance practice
03A student club's process: event registrations, member comms, treasury
04Your own job search: tracking applications, follow-ups, recruiter responses across platforms
05A researcher's grunt work in your department — papers to organize, citations to format, datasets to clean
3 MATPLOTLIB CHARTS IN A STREAMLIT APP "It shows the data." — every student dashboard, 2019–2026 ONE DECISION · THREE NUMBERS · ONE FLAG REORDER TODAY 12 SKUs below safety stock ANOMALY · CASABLANCA WAREHOUSE Returns up 41% week-over-week. Investigate → REVENUE · 7D +8.4% MARGIN · 7D −1.2% › why did Casablanca returns spike this week?
What must be thereHow to go beyond
01
Real, updating dataset. not a static CSV downloaded once
02
Clear primary user. and the decision they're trying to make when they open it
03
3–5 metrics. that map to that decision — not 20 charts because they looked nice
04
Real filtering and drill-down. time comparisons (WoW, YoY) that actually work
05
Loads in under 3 seconds. non-technical users close slow tabs
06
Works on mobile. because that's where half your users will open it
+
Natural-language queries. "show me last month's top 5 clients by revenue" — no SQL
+
Automatic anomaly detection. the dashboard tells the user what's unusual
+
Scheduled summaries. Monday-morning email with the 3 things that changed last week
+
Role-based views. a salesperson sees their pipeline; a manager sees the team's
+
Export to real formats. PDF for management, Excel for finance, image for WhatsApp
+
"Why did this change?" walks back the cause
+
Comments and annotations. users leave notes on data points for the next person
The one decision that signals you understood the problem A non-technical person used it for ten minutes without help, and came back the next day on their own.
Examples worth shipping
01A small business currently running on WhatsApp screenshots and Excel files
02A sports club, NGO, or association tracking what they actually care about
03A personal-finance dashboard from your real bank statements — you'll feel bad UX immediately
04A public-data dashboard your city doesn't have but should — air quality, transit reliability
05A researcher in your faculty tracking their own KPIs — papers, citations, grants
A GENERAL-PURPOSE CHATBOT Anything for anyone a kid a CEO a poet a coder a chef a doctor a lawyer a tutor a teen a lab "Useful for everyone" is the same as "indispensable for no one." ONE PROFESSION · ONE SUB-TASK · ONE WORKFLOW USER Notary small-firm, Moroccan commercial law jurisdiction paying customer · named budget line WORKFLOW · NOT A CHAT 1. Intake formparties · property · price 2. Draft clausesgrounded in code civil 3. Reviewable difflawyer edits in-line 4. Export .docxwith letterhead
What must be thereHow to go beyond
01
One profession · one sub-task. "help notaries draft preliminary sale agreements" — not "help lawyers with law"
02
Domain-grounded outputs. every claim or clause traceable to a source the professional trusts
03
A workflow, not just a chat. input → structured steps → reviewable output
04
Export to their format. Word with their letterhead — not Markdown
05
Field-appropriate guardrails. medical, legal, financial outputs need stronger refusals than marketing
06
Used on real cases. by at least one real professional — not a demo audience
+
Domain-specific eval set. built with the professional, covering the cases that matter to them
+
Visible multi-step reasoning. they need to verify, not just consume
+
Integration with their stack. DocuSign, practice management, accounting platforms
+
Knowledge base of their own work. the assistant learns from prior cases
+
Field-appropriate compliance. HIPAA-adjacent for medical, attorney-client privilege for legal
+
Pricing & packaging. work out the cost and who pays, even if you don't sell yet
+
"Not a replacement for judgment". enforced in the product, not just the footer
The one decision that signals you understood the problem A working professional would pay for it, and you can name the price and the budget line it comes out of.
Examples worth shipping
01A contract review assistant for small-firm lawyers in a specific jurisdiction
02A clinical-note structurer for GPs — voice in, structured note out
03A tax-filing assistant for freelancers in a specific country and tax regime
04A tender-response assistant for small consulting firms bidding on government contracts
05A lesson-plan assistant for teachers in a specific grade and subject
06A real-estate listing assistant — photo in, listing copy + comparables out
A "RESEARCH AGENT" — A WRAPPER AROUND AN LLM THAT LOOPS UNTIL IT GIVES UP LLM loop Loops, burns tokens, returns a wall of text the user rewrites anyway. SALES-CALL → CRM + EMAIL + INVITE · 5 MIN · $0.14/RUN INPUT Recording .mp4 · 32min STEP 2TranscribeWhisper STEP 3Extractobjections · next step · price STEP 4Planvisible to user STEP 5Human reviewedit before send OUTPUTS · A HUMAN ACCEPTED WITHOUT REDOING THE WORK CRM entrypushed to Salesforce Follow-up email draftin salesperson's voice Calendar invitenext step + agenda STOPS · LOGS EVERY DECISION · COST CAPPED PER RUN
What must be thereHow to go beyond
01
One narrow, named task. clear input, clear output
02
Multiple tools in sequence. not a single LLM call dressed up as an agent
03
Clear stopping condition. knows when it's done, doesn't loop forever, doesn't burn tokens
04
Human-in-the-loop at the right step. review before sending — not after damage is done
05
Better than "the agent crashed". partial outputs, retries, escalation
06
Cost per run, tracked. $4/run isn't a product, it's a bug
07
Real run, real work, accepted. without the human rewriting it
+
Cross-run memory. remembers context, preferences, prior decisions per user
+
Visible plan before execution. agents that announce their plan are easier to trust
+
Self-correction. detects broken link, malformed file, wrong recipient — fixes before delivering
+
Evals you run on every change. agents regress silently — catch it
+
Multi-agent only when it earns its keep. most multi-agent systems are one agent that should have stayed one
+
Clean hand-off format. the output is the input to the next person's workflow
+
Observability. every tool call, decision, and token logged and inspectable
The one decision that signals you understood the problem A professional would let your agent run on their real work without supervising it for the full duration.
Examples worth shipping
01Sales call → CRM entry + follow-up email + calendar invite + flagged objections, handed back in 5 min
02Inbox triage drafts for routine categories (scheduling, status, document requests) — human approves and sends
03Bug-report line → reproducible test case + draft fix as a PR
04Job description → tailored CV + cover letter + recruiter outreach, grounded in your real experience
05Onboarding a new client for a small consulting firm: contract, kickoff doc, shared folder, first-week schedule
06PR reviews against this codebase's conventions — not generic linting
Closing

Pick one. Not five.

One project, shipped to the standard above, beats five half-built ones every time. Pick the one closest to a real user you have access to — the lawyer in your family, the manager at your internship, the doctor friend. Domain access is the unfair advantage students underuse.

Ship before it's ready. Polish in public. Nobody hires the person who's "still working on it" forever. Write about it. Get one real user — not your study group; one person who didn't know you before, who chose to use it, and who'd notice if it broke. That's the line between a project and a product.

The students who do this will be the juniors who don't get replaced. Everyone else is competing for the seat that AI is quietly removing.

$ git init
Open a new repo today. Name it the thing you're going to build — not the thing you've already built.
Share a passage

Found a line that lands? Hand it to your network.

Pick a card, choose paper or ink, and share where you read.

See all cards