"It runs end-to-end and it's scheduled" is a start line, not a finish line.
A script proves that one execution path worked, once, on your machine, against today's data, while you watched. A production data pipeline proves something much harder: that it will keep working on a Tuesday at 3 a.m., when the source schema changed overnight, nobody's looking, and the only person who'll notice the breakage is the analyst whose dashboard is now quietly wrong.
The difference isn't the code. It's the contract. A pipeline is software that makes promises to three parties — the source it reads from, the consumer it feeds, and the business that depends on the result — and answers four questions in writing: what, when, how, and why. Before you write any code, write that contract. Then hold the pipeline to a concrete readiness checklist (configurable, containerized, validated I/O, versioned, logged, idempotent, retryable, tested, infra-decoupled).
When it breaks at 3 a.m., who finds out, and how long after? The worst data failures aren't loud. A pipeline that crashes and pages you is working as intended — it told you. The dangerous one keeps running, keeps writing plausible-looking numbers, and says nothing.
If the honest answer is "the consumer, three days later," you didn't ship a pipeline. You shipped a liability you haven't named yet.
§ 01 · The Contract A pipeline is a contract, not a script — the four questions
The fastest way to look like a senior engineer is to answer four questions before writing code, in a doc, where your team can see them. A script answers none of these. A pipeline answers all four.
What — the schema and semantics. What's the exact shape of the data coming in and going out? Column names, types, value constraints, what a missing value means. Not "a CSV with some user data" — the actual contract: user_id is a non-null integer, signup_date is ISO-8601, revenue is in cents not dollars. This is a promise to your consumer: here is precisely what you'll get.
When — the schedule and freshness. How often does it run, and how late can the data be before someone downstream is making decisions on stale numbers? Daily at 6 a.m. is a when. "Whenever I remember to run the cell" is not. This is a promise about timeliness — an informal SLA, whether or not anyone calls it that.
How — the lineage you can trace. If a number looks wrong three steps downstream, can you trace it back to where it came from and what happened to it on the way? Lineage is the end-to-end record of a data asset's journey: its origin, every transformation applied, and where it lands. Without it, debugging a bad number means manually reading logs and code until you find the culprit — usually after the consumer already has.
Why — the business purpose. What decision does this data feed? "It populates the exec revenue dashboard" is a why. "I'm not totally sure who uses it" is a five-alarm fire — it means a thing can break and you won't know how much it matters or who to warn. If you can't name the consumer, you can't size the blast radius.
If your "pipeline" can't answer all four, it isn't one yet. It's a script with a cron job and good intentions.
Pipeline contract · v1 The four questions, before the code
| Question | What it pins down | Promise to | Weak vs. strong |
|---|---|---|---|
| What.schema · semantics | Column names, types, value constraints, what a missing value means. | Consumer | weaka CSV with user data stronguser_id: non-null int · revenue in cents |
| When.schedule · freshness | Cadence and the max staleness someone downstream can tolerate. | Consumer · Business | weakwhen I remember to run it strongdaily 06:00 UTC · max 2h late |
| How.lineage you can trace | Origin, every transformation, and where it lands. | You, the debugger | weakit just shows up in the warehouse strongraw → staged → marts · per-step row counts logged |
| Why.business purpose | The decision this data feeds, and who makes it. | Business | weaknot totally sure who uses it strongfeeds the exec revenue dashboard, refreshed pre-standup |
§ 02 · The Checklist The production-readiness checklist
Here's the concrete gap between "it runs" and "it's production-grade." This is the part you can literally run down before you call something done. None of it is exotic; it's the baseline the field already agreed on. In Fundamentals of Data Engineering (O'Reilly, 2022) — the closest thing the discipline has to a standard reference — Joe Reis and Matt Housley treat most of this list not as best-practice nice-to-haves but as the operational undercurrents that run beneath every production system. In other words: this isn't my bar. It's the field's.
Run it down against whatever you're about to ship. You don't need all eleven on day one for every internal job. But you do need to know which ones you're skipping and why — because each skipped item is a promise in the contract you've quietly decided not to keep.
Pre-flight · production-readiness Run it down before you call it done
I · Reproducible
II · Trustworthy
III · Operable
§ 03 · The 3 a.m. Test When it breaks, who finds out — and when?
Every item on that checklist serves one question: when this breaks, who finds out, and how long after?
This is the test that separates a pipeline from a liability, because the worst data failures aren't loud. A pipeline that crashes at 3 a.m. and pages you is working as intended — it told you. The dangerous one keeps running, keeps writing plausible-looking numbers, and says nothing. By the time someone notices, the bad data has propagated into dashboards, reports, and decisions.
Scenario · same failure, two outcomes 03:00 · the source schema changes
And "someone" is rarely the engineer. In Monte Carlo's 2023 State of Data Quality survey (200 data professionals, conducted by Wakefield Research — worth noting Monte Carlo sells observability tooling, so read it as directional), 74% of respondents said business stakeholders find data quality issues first, most or all of the time — up from 47% the year before. The people discovering your broken pipeline are the ones consuming its output, not the ones who built it. That's the liability test failing in the wild, at industry scale.
The cost of silent failure isn't hypothetical.