Should I use LangChain for a production LLM feature?

Usually not as your default. Most load-bearing LLM features — a RAG call, a structured response, a retry, a log line — are about fifty lines of your own code: small enough to read top to bottom, set a breakpoint anywhere, and debug your code and the model directly rather than your code, then the framework, then the model. Reach for a framework only after you have built the shape a few times and know which abstraction you need, or when the work is genuinely stateful agent orchestration. For throwaway prototypes and standard RAG you will never customize, the framework's defaults are a fine shortcut.

What is the rule of three for software abstractions?

Build it by hand the first time. Copy yourself the second time, even though it feels wrong. Extract a shared abstraction only on the third time — because by then you have seen the pattern enough to know its real shape. The rule comes from refactoring and applies cleanly to LLM features: adopting a framework on day one commits you to someone else's guess about your problem before you have seen it clearly. Building by hand first means the abstraction you eventually extract is shaped like your problem, not the average of ten thousand other teams'.

When should I use LangChain or LangGraph?

Four cases. Prototypes and demos, where speed to a working version is the only metric. Standard RAG you will genuinely never customize, where the framework's loaders and adapters save real boilerplate. Stateful agent orchestration — multi-step agents that pause, resume, checkpoint state, and hand control to a human mid-flight — which is hard infrastructure LangGraph has already paid the cost of; this is the strongest case. And teams that need a shared, standardized vocabulary more than raw control. Outside those, hand-written glue is usually smaller, faster, and easier to debug.

I don't use LangChain. Here's why.

I don’t reach for LangChain when I build an LLM feature. Not because it’s a bad framework — it isn’t, and I’ll show you exactly where it earns its place — but because the reflex to reach for a framework, any framework, before I understand the problem has cost me more time than it ever saved.

Here’s the rule I use instead, and if you read nothing else, read this: build it by hand the first time. Copy yourself the second. Extract an abstraction only on the third. Most LLM features you’ll ship are smaller than the framework you’d wrap them in — an HTTP call, a structured response, a retry, a log line. You can hold the whole thing in your head. The day you can’t is the day you’ve earned an abstraction, and by then you’ll know exactly which one.

Build it by hand the first time. Copy yourself the second. Extract an abstraction only on the third.

This is for you if you’re shipping a production LLM feature you’ll personally own — the kind you’ll be paged about at 2am, the kind where “it works on the demo” isn’t the finish line. If you’re throwing together a weekend prototype, ignore all of this and use whatever gets you to a working demo fastest; speed is the only metric that matters there. And if you’re still learning what RAG even is, a framework’s tutorials are a genuinely good on-ramp — come back when you’ve felt the pain at least once.

§ 01The rule of three

The rule is older than LLMs. It comes from refactoring: the first time you write something, just write it. The second time you write something similar, wince and copy it anyway. The third time, now you extract the shared abstraction — because now you’ve seen the pattern enough times to know its real shape.

Premature abstraction is the bug the rule prevents. When you adopt a framework on day one, you’re committing to someone else’s guess about your problem before you’ve seen your problem clearly. You inherit their concepts — their idea of a “chain,” a “retriever,” an “agent” — and spend your time translating your actual logic into their vocabulary instead of just writing your actual logic.

When you adopt a framework on day one, you’re committing to someone else’s guess about your problem before you’ve seen your problem clearly.

The rule of three buys you the one thing early adoption steals: you understand the problem before you commit to a shape for it. First build, you learn how the model actually behaves. Second build, you see what repeats. Third build, the abstraction is obvious — and it’s yours, shaped like your problem, not the average of ten thousand other teams’.

The rule of three build · copy · decide

Build 1 Write it by hand

Build 2 Copy yourself

Build 3 Now decide

Stateless · RAG, one call, simple tool Extract your own ~50 lines

Stateful · pause / resume / checkpoint / human-in-loop Reach for a framework (LangGraph)

Build 1 — write it by hand.
Build 2 — copy yourself.
Build 3 — now decide.
If the work is stateless (RAG, a single call, a simple tool), extract your own roughly fifty lines.
If the work is stateful (pause, resume, checkpoint, human-in-the-loop), reach for a framework like LangGraph.

Figure 01The rule of three, with the fork at build 3.

§ 02What the fifty lines actually is

People hear “write it yourself” and picture reinventing a vector database. That’s not it. Here’s what a real first-pass RAG feature actually contains, start to finish — and the gap between a demo like this and a production system is its own story:

An HTTP call to the model. One request, one response. The provider already gives you a clean SDK.
A schema’d output. You ask for structured JSON and you validate it on the way back. Twenty lines with a library you already use.
A retry on failure. The model returns garbage or times out; you try again with backoff. A decorator.
A log line for everything. Every prompt, every response, every retry — because when it misbehaves in production, the logs are how you find out why.
Plus, for RAG specifically: a vector search (your database does this) and a chunking function (a dozen lines, and you’ll want to tune it by hand anyway).

rag_feature.py ~50 lines · the whole feature, on one screen

import logging, time
from functools import wraps
from pydantic import BaseModel
from openai import OpenAI

log = logging.getLogger("rag")
client = OpenAI()

# ➌ retry-with-backoff: one decorator, reused everywhere
def retry(times=3, base=0.5):
    def deco(fn):
        @wraps(fn)
        def wrapped(*args, **kw):
            for i in range(times):
                try:
                    return fn(*args, **kw)
                except Exception as err:
                    log.warning("retry %d/%d: %s", i + 1, times, err)
                    time.sleep(base * 2 ** i)
            raise RuntimeError("exhausted retries")
        return wrapped
    return deco

# ➋ the answer schema you ask for and validate on the way back
class Answer(BaseModel):
    text: str
    sources: list[str]

# ➏ chunking: a dozen lines, and you will tune it by hand anyway
def chunk(doc, size=800, overlap=100):
    step = size - overlap
    return [doc[i:i + size] for i in range(0, len(doc), step)]

# ➎ vector search: your database already does this
def search(query, k=5):
    hits = store.query(embed(query), top_k=k)   # your vector store
    return [h.text for h in hits]

@retry()
def answer(question):
    context = "\n\n".join(search(question))                 # ➎ retrieve
    log.info("rag q=%r chunks=%d", question, len(context))   # ➍ log everything
    reply = client.chat.completions.parse(                   # ➊ model HTTP call
        model="gpt-...",
        messages=[
            {"role": "system", "content": "Answer only from the context."},
            {"role": "user", "content": context + "\n\nQ: " + question},
        ],
        response_format=Answer,                              # ➋ schema'd output
    )
    return reply.choices[0].message.parsed                   # ➋ already validated

➀ model HTTP call ➁ schema’d output + validation ➂ retry with backoff ➃ log everything ➄ vector search ➅ chunking

Figure 02A full first-pass RAG feature, start to finish.

That’s the whole feature. It fits on one screen. You can read it top to bottom, you can set a breakpoint anywhere, and when it breaks you’re debugging your code and the model — not your code, then the framework, then the model.

That last part is the real payoff, and it’s worth saying plainly: a new engineer should be able to read your LLM code top to bottom in fifteen minutes. Fifty lines of direct calls passes that test. A framework, with its chains and callbacks and implicit control flow, usually doesn’t — not because the engineer is slow, but because half the behavior lives in code they can’t see.

A new engineer should be able to read your LLM code top to bottom in fifteen minutes.

Think of a Swiss Army knife sitting on a kitchen counter next to a chef’s knife. The Swiss Army knife has every tool — blade, scissors, screwdriver, the little saw. It is also slower at every single one of them than the dedicated tool lying right beside it. A framework is the Swiss Army knife: astonishing range, and never quite the right shape for the one cut you’re making right now.

§ 03The two costs of an early framework

Adopting a framework before you need one isn’t free, even when it works. You pay in two currencies.

Abstraction debt. Every framework concept you learn — chain, agent, retriever, runnable — is one more layer between you and what the model is actually doing. The cost is invisible until something breaks, and then it’s brutal: you debug the framework first, and the model second. This isn’t a strawman. Anil Gulecha, an AI architect who wrote up moving his team off LangChain after three production agents, found that direct, hand-written tool-calling loops were more reliable and faster in production, and ended up restricting the framework to basic RAG while running agent orchestration in plain Python. The abstraction that felt like a head start became the thing he had to dig through.

The sharpest version of this cost is financial. Dmitry Livshitz, who replaced LangChain with a custom orchestrator, documents how opaque framework loops hide how often the model is actually being called. When an agent hits a tool failure or a reasoning loop, those hidden retries multiply — and in one case he describes, a client’s monthly API bill climbed before anyone noticed, because the framework’s runtime was quietly re-planning and re-calling a frontier model in a loop nobody could see.

Hidden-loop API bill

$61,800/mo

Up from $3,400 before the framework’s runtime began silently re-planning and re-calling a frontier model in a loop nobody could watch. — Livshitz, custom-orchestrator case, May 2026.

Of Klarna’s inquiries

2/3

Handled by a LangGraph support-agent network — the work of an estimated 853 full-time agents. The case where the framework genuinely earns its place. — LangChain customer stories, 2024–25.

Conceptual lock-in. This is the one people get wrong, so let me be precise about it. LangChain used to be genuinely unstable — the pre-1.0 era churned hard, and the 0.2-to-0.3 migration was enough of a rewrite that migration guides called it a “mini-rewrite” rather than an upgrade. That’s fixed. As of version 1.0 in October 2025, LangChain adopted strict semantic versioning and a long-term-support policy: breaking changes are confined to major versions, and 1.x stays backward-compatible. If your only worry was version churn, that worry is largely resolved, and I won’t pretend otherwise.

But version churn was never the lock-in that hurt. The lock-in that hurts is conceptual. The day your retrieval strategy changes — and it will; you’ll move from naive top-k to hybrid search, or add a reranker, or switch to a different chunking logic — you’re not rewriting around the model’s API. You’re rewriting around the framework’s idea of how retrieval is supposed to work.

SemVer protects you from broken imports. It does nothing to protect you from having modeled your problem in someone else’s vocabulary.

That bill comes due regardless of how stable the version number is.

§ 04When the framework genuinely wins

If I only told you the costs, I’d be selling you the same reflex in reverse — “never use a framework” is exactly as lazy as “always use one.” So here, plainly, is where reaching for LangChain (or its agent runtime, LangGraph) is the right call:

Prototyping and demos. When the only metric is speed-to-working, the framework’s pre-built loaders, splitters, and vector-store adapters get you to a demo with almost no boilerplate. Don’t write fifty lines to throw them away on Friday.
Standard RAG you will genuinely never customize. If your retrieval is and will remain document-load → chunk → embed → top-k, the framework does that out of the box and swapping an embedding model is a one-line change. The fifty-lines argument only wins when you’ll actually touch those lines.
Stateful agent orchestration. This is the real one. Multi-step agents that have to pause, resume, checkpoint state, and hand control to a human mid-flight are hard infrastructure — and hand-rolling a transactional state machine is its own kind of hubris. LangGraph has already paid that cost: it’s what Klarna runs for a support-agent network that handles two-thirds of its customer inquiries, and what Replit uses to power agentic coding for millions of users.
A team that needs shared vocabulary more than raw control. A standardized “retriever” and “tool” that every engineer already understands can be worth more than a bespoke architecture only its author can navigate.

And lest the “nobody serious uses it” strawman creep in: this is not a fringe tool. LangChain has over 100,000 GitHub stars and roughly 260 million downloads a month, with named production deployments at Klarna, Replit, Rakuten, and others. Plenty of good engineers ship it every day. The argument here is about default, not legitimacy.

§ 05So which situation are you in?

Run the four questions. They sort you toward glue or framework without any judgment calls.

Glue or framework? four gates, top to bottom

Q1Load-bearing, or a throwaway? Throwaway → use the framework, optimize for speed.
Q2Will the flow stay standard? Staying vanilla → the framework’s defaults are fine.
Q3Built this shape 3+ times? Fewer than three → write the glue; you don’t know the abstraction yet.
Q4At build 3 — is it stateful? Stateless → extract your own ~50 lines. Stateful → reach for LangGraph.

Figure 03Four questions that sort you toward glue or framework.

Is it load-bearing? Will you own and debug this in production, or is it a throwaway prototype? Throwaway → use the framework, optimize for speed.
Will the flow stay standard? Is your retrieval/agent logic going to stay vanilla, or will you customize it? Staying vanilla → the framework’s defaults are fine.
Have you built this shape three times? Fewer than three → write the glue; you don’t know the abstraction yet.
At the third build — is it stateful? Does it need pause/resume, checkpointing, human-in-the-loop? Stateful → reach for LangGraph. Stateless → extract your own fifty lines.

The pattern: glue is the default for load-bearing, custom, early-stage, stateless work — which is most of what teams actually ship. The framework earns its place at the edges: throwaways, never-customized standard cases, and genuinely stateful orchestration.

§ 06The rule, one last time

So here’s the rule, stated for the last time, with the part nobody tells you: build it by hand the first time. Copy yourself the second. And the third time — when you finally know the shape of the thing — that’s when you decide. Not before.

But “decide” doesn’t mean “adopt LangChain.” It means look at what you’re actually building. If it’s stateless — RAG, a single call, a simple tool loop — extract your own fifty lines. You’ve earned them, they fit your problem, and you’ll debug them in an afternoon. If it’s stateful — a multi-step agent that has to pause, resume, checkpoint, and ask a human mid-flight — that’s the one case where the framework genuinely wins. Hand-rolling a transactional state machine is its own kind of hubris, and LangGraph has already paid that cost for you.

The framework was never the enemy. The reflex was — reaching for the abstraction before you knew which one you needed.

The rule of three just buys you the one thing premature adoption steals: the knowledge of what you’re actually building, before you commit to how.

Sources

1
Anil Gulecha, Kalvium Labs — “Why we stopped using LangChain after 3 production agents.” (March 2026.)
2
Dmitry Livshitz, AI Microservices — “Why We Stopped Using LangChain and Built an Orchestrator Instead.” (May 2026.)
3
LangChain documentation — release & versioning policy, v1.0. (October 2025.)
4
Crawleo — LangChain 0.2→0.3 migration analysis. (Early 2026.)
5
LangChain customer stories — Klarna, Replit, Rakuten case studies. (2024–2025.)
6
PyPI Stats / LangChain GitHub — download and adoption metrics. (May 2026.)