I don’t reach for LangChain when I build an LLM feature. Not because it’s a bad framework — it isn’t, and I’ll show you exactly where it earns its place — but because the reflex to reach for a framework, any framework, before I understand the problem has cost me more time than it ever saved.
Here’s the rule I use instead, and if you read nothing else, read this: build it by hand the first time. Copy yourself the second. Extract an abstraction only on the third. Most LLM features you’ll ship are smaller than the framework you’d wrap them in — an HTTP call, a structured response, a retry, a log line. You can hold the whole thing in your head. The day you can’t is the day you’ve earned an abstraction, and by then you’ll know exactly which one.
Build it by hand the first time. Copy yourself the second. Extract an abstraction only on the third.
This is for you if you’re shipping a production LLM feature you’ll personally own — the kind you’ll be paged about at 2am, the kind where “it works on the demo” isn’t the finish line. If you’re throwing together a weekend prototype, ignore all of this and use whatever gets you to a working demo fastest; speed is the only metric that matters there. And if you’re still learning what RAG even is, a framework’s tutorials are a genuinely good on-ramp — come back when you’ve felt the pain at least once.
§ 01The rule of three
The rule is older than LLMs. It comes from refactoring: the first time you write something, just write it. The second time you write something similar, wince and copy it anyway. The third time, now you extract the shared abstraction — because now you’ve seen the pattern enough times to know its real shape.
Premature abstraction is the bug the rule prevents. When you adopt a framework on day one, you’re committing to someone else’s guess about your problem before you’ve seen your problem clearly. You inherit their concepts — their idea of a “chain,” a “retriever,” an “agent” — and spend your time translating your actual logic into their vocabulary instead of just writing your actual logic.
When you adopt a framework on day one, you’re committing to someone else’s guess about your problem before you’ve seen your problem clearly.
The rule of three buys you the one thing early adoption steals: you understand the problem before you commit to a shape for it. First build, you learn how the model actually behaves. Second build, you see what repeats. Third build, the abstraction is obvious — and it’s yours, shaped like your problem, not the average of ten thousand other teams’.
- Build 1 — write it by hand.
- Build 2 — copy yourself.
- Build 3 — now decide.
- If the work is stateless (RAG, a single call, a simple tool), extract your own roughly fifty lines.
- If the work is stateful (pause, resume, checkpoint, human-in-the-loop), reach for a framework like LangGraph.
§ 02What the fifty lines actually is
People hear “write it yourself” and picture reinventing a vector database. That’s not it. Here’s what a real first-pass RAG feature actually contains, start to finish — and the gap between a demo like this and a production system is its own story:
- An HTTP call to the model. One request, one response. The provider already gives you a clean SDK.
- A schema’d output. You ask for structured JSON and you validate it on the way back. Twenty lines with a library you already use.
- A retry on failure. The model returns garbage or times out; you try again with backoff. A decorator.
- A log line for everything. Every prompt, every response, every retry — because when it misbehaves in production, the logs are how you find out why.
- Plus, for RAG specifically: a vector search (your database does this) and a chunking function (a dozen lines, and you’ll want to tune it by hand anyway).
import logging, time
from functools import wraps
from pydantic import BaseModel
from openai import OpenAI
log = logging.getLogger("rag")
client = OpenAI()
# ➌ retry-with-backoff: one decorator, reused everywhere
def retry(times=3, base=0.5):
def deco(fn):
@wraps(fn)
def wrapped(*args, **kw):
for i in range(times):
try:
return fn(*args, **kw)
except Exception as err:
log.warning("retry %d/%d: %s", i + 1, times, err)
time.sleep(base * 2 ** i)
raise RuntimeError("exhausted retries")
return wrapped
return deco
# ➋ the answer schema you ask for and validate on the way back
class Answer(BaseModel):
text: str
sources: list[str]
# ➏ chunking: a dozen lines, and you will tune it by hand anyway
def chunk(doc, size=800, overlap=100):
step = size - overlap
return [doc[i:i + size] for i in range(0, len(doc), step)]
# ➎ vector search: your database already does this
def search(query, k=5):
hits = store.query(embed(query), top_k=k) # your vector store
return [h.text for h in hits]
@retry()
def answer(question):
context = "\n\n".join(search(question)) # ➎ retrieve
log.info("rag q=%r chunks=%d", question, len(context)) # ➍ log everything
reply = client.chat.completions.parse( # ➊ model HTTP call
model="gpt-...",
messages=[
{"role": "system", "content": "Answer only from the context."},
{"role": "user", "content": context + "\n\nQ: " + question},
],
response_format=Answer, # ➋ schema'd output
)
return reply.choices[0].message.parsed # ➋ already validated That’s the whole feature. It fits on one screen. You can read it top to bottom, you can set a breakpoint anywhere, and when it breaks you’re debugging your code and the model — not your code, then the framework, then the model.
That last part is the real payoff, and it’s worth saying plainly: a new engineer should be able to read your LLM code top to bottom in fifteen minutes. Fifty lines of direct calls passes that test. A framework, with its chains and callbacks and implicit control flow, usually doesn’t — not because the engineer is slow, but because half the behavior lives in code they can’t see.
A new engineer should be able to read your LLM code top to bottom in fifteen minutes.
§ 03The two costs of an early framework
Adopting a framework before you need one isn’t free, even when it works. You pay in two currencies.
Abstraction debt. Every framework concept you learn — chain, agent, retriever, runnable — is one more layer between you and what the model is actually doing. The cost is invisible until something breaks, and then it’s brutal: you debug the framework first, and the model second. This isn’t a strawman. Anil Gulecha, an AI architect who wrote up moving his team off LangChain after three production agents, found that direct, hand-written tool-calling loops were more reliable and faster in production, and ended up restricting the framework to basic RAG while running agent orchestration in plain Python. The abstraction that felt like a head start became the thing he had to dig through.
The sharpest version of this cost is financial. Dmitry Livshitz, who replaced LangChain with a custom orchestrator, documents how opaque framework loops hide how often the model is actually being called. When an agent hits a tool failure or a reasoning loop, those hidden retries multiply — and in one case he describes, a client’s monthly API bill climbed before anyone noticed, because the framework’s runtime was quietly re-planning and re-calling a frontier model in a loop nobody could see.
Conceptual lock-in. This is the one people get wrong, so let me be precise about it. LangChain used to be genuinely unstable — the pre-1.0 era churned hard, and the 0.2-to-0.3 migration was enough of a rewrite that migration guides called it a “mini-rewrite” rather than an upgrade. That’s fixed. As of version 1.0 in October 2025, LangChain adopted strict semantic versioning and a long-term-support policy: breaking changes are confined to major versions, and 1.x stays backward-compatible. If your only worry was version churn, that worry is largely resolved, and I won’t pretend otherwise.
But version churn was never the lock-in that hurt. The lock-in that hurts is conceptual. The day your retrieval strategy changes — and it will; you’ll move from naive top-k to hybrid search, or add a reranker, or switch to a different chunking logic — you’re not rewriting around the model’s API. You’re rewriting around the framework’s idea of how retrieval is supposed to work.
SemVer protects you from broken imports. It does nothing to protect you from having modeled your problem in someone else’s vocabulary.
That bill comes due regardless of how stable the version number is.
§ 04When the framework genuinely wins
If I only told you the costs, I’d be selling you the same reflex in reverse — “never use a framework” is exactly as lazy as “always use one.” So here, plainly, is where reaching for LangChain (or its agent runtime, LangGraph) is the right call:
- Prototyping and demos. When the only metric is speed-to-working, the framework’s pre-built loaders, splitters, and vector-store adapters get you to a demo with almost no boilerplate. Don’t write fifty lines to throw them away on Friday.
- Standard RAG you will genuinely never customize. If your retrieval is and will remain document-load → chunk → embed → top-k, the framework does that out of the box and swapping an embedding model is a one-line change. The fifty-lines argument only wins when you’ll actually touch those lines.
- Stateful agent orchestration. This is the real one. Multi-step agents that have to pause, resume, checkpoint state, and hand control to a human mid-flight are hard infrastructure — and hand-rolling a transactional state machine is its own kind of hubris. LangGraph has already paid that cost: it’s what Klarna runs for a support-agent network that handles two-thirds of its customer inquiries, and what Replit uses to power agentic coding for millions of users.
- A team that needs shared vocabulary more than raw control. A standardized “retriever” and “tool” that every engineer already understands can be worth more than a bespoke architecture only its author can navigate.
And lest the “nobody serious uses it” strawman creep in: this is not a fringe tool. LangChain has over 100,000 GitHub stars and roughly 260 million downloads a month, with named production deployments at Klarna, Replit, Rakuten, and others. Plenty of good engineers ship it every day. The argument here is about default, not legitimacy.
§ 05So which situation are you in?
Run the four questions. They sort you toward glue or framework without any judgment calls.
- Q1Load-bearing, or a throwaway? Throwaway → use the framework, optimize for speed.
- Q2Will the flow stay standard? Staying vanilla → the framework’s defaults are fine.
- Q3Built this shape 3+ times? Fewer than three → write the glue; you don’t know the abstraction yet.
- Q4At build 3 — is it stateful? Stateless → extract your own ~50 lines. Stateful → reach for LangGraph.
- Is it load-bearing? Will you own and debug this in production, or is it a throwaway prototype? Throwaway → use the framework, optimize for speed.
- Will the flow stay standard? Is your retrieval/agent logic going to stay vanilla, or will you customize it? Staying vanilla → the framework’s defaults are fine.
- Have you built this shape three times? Fewer than three → write the glue; you don’t know the abstraction yet.
- At the third build — is it stateful? Does it need pause/resume, checkpointing, human-in-the-loop? Stateful → reach for LangGraph. Stateless → extract your own fifty lines.
The pattern: glue is the default for load-bearing, custom, early-stage, stateless work — which is most of what teams actually ship. The framework earns its place at the edges: throwaways, never-customized standard cases, and genuinely stateful orchestration.
§ 06The rule, one last time
So here’s the rule, stated for the last time, with the part nobody tells you: build it by hand the first time. Copy yourself the second. And the third time — when you finally know the shape of the thing — that’s when you decide. Not before.
But “decide” doesn’t mean “adopt LangChain.” It means look at what you’re actually building. If it’s stateless — RAG, a single call, a simple tool loop — extract your own fifty lines. You’ve earned them, they fit your problem, and you’ll debug them in an afternoon. If it’s stateful — a multi-step agent that has to pause, resume, checkpoint, and ask a human mid-flight — that’s the one case where the framework genuinely wins. Hand-rolling a transactional state machine is its own kind of hubris, and LangGraph has already paid that cost for you.
The framework was never the enemy. The reflex was — reaching for the abstraction before you knew which one you needed.
The rule of three just buys you the one thing premature adoption steals: the knowledge of what you’re actually building, before you commit to how.
Sources
- 1Anil Gulecha, Kalvium Labs — “Why we stopped using LangChain after 3 production agents.” (March 2026.)
- 2Dmitry Livshitz, AI Microservices — “Why We Stopped Using LangChain and Built an Orchestrator Instead.” (May 2026.)
- 3LangChain documentation — release & versioning policy, v1.0. (October 2025.)
- 4Crawleo — LangChain 0.2→0.3 migration analysis. (Early 2026.)
- 5LangChain customer stories — Klarna, Replit, Rakuten case studies. (2024–2025.)
- 6PyPI Stats / LangChain GitHub — download and adoption metrics. (May 2026.)