What's the difference between a script and a data pipeline?

A script proves that one execution path worked, once, on your machine, against today's data, while you watched. A production pipeline proves something much harder: that it will keep working on a Tuesday at 3 a.m., when the source schema changed overnight, nobody's looking, and the only person who'll notice the breakage is the analyst whose dashboard is now quietly wrong. The difference isn't the code — it's the contract. A pipeline makes promises to three parties (source, consumer, business) and answers four questions in writing: what, when, how, and why.

What is idempotency in a data pipeline and why does it matter?

An idempotent pipeline produces the same result when run twice with the same input — no duplicated rows, no double-counted revenue, no corrupted table. It's the production-readiness item juniors miss most. A naive INSERT that runs fine once will silently double your numbers when a retry fires — and retries always fire eventually. Idempotent loads use MERGE/UPSERT or partition overwrites instead. Non-idempotent code corrupts data without throwing a single error, which is why it's the most dangerous failure mode.

What is the 3 a.m. test for data pipelines?

When this breaks at 3 a.m., who finds out, and how long after? It's the single gut-check that separates a pipeline from a liability. A pipeline that crashes and pages you is working as intended — it told you. The dangerous one keeps running, keeps writing plausible-looking numbers, and says nothing. If the honest answer is 'the consumer, three days later,' you didn't ship a pipeline. You shipped a liability with a schedule. In Monte Carlo's 2023 survey, 74% of data professionals said business stakeholders find data quality issues first — at industry scale, that's the 3 a.m. test failing in the wild.

Your script runs. That doesn't make it a pipeline

Question	What it pins down	Promise to	Weak vs. strong
What.schema · semantics	Column names, types, value constraints, what a missing value means.	Consumer	weaka CSV with user data stronguser_id: non-null int · revenue in cents
When.schedule · freshness	Cadence and the max staleness someone downstream can tolerate.	Consumer · Business	weakwhen I remember to run it strongdaily 06:00 UTC · max 2h late
How.lineage you can trace	Origin, every transformation, and where it lands.	You, the debugger	weakit just shows up in the warehouse strongraw → staged → marts · per-step row counts logged
Why.business purpose	The decision this data feeds, and who makes it.	Business	weaknot totally sure who uses it strongfeeds the exec revenue dashboard, refreshed pre-standup

So run the test on whatever you're about to ship. If it breaks tonight, does a monitor catch it, or does the analyst catch it Thursday? If it writes garbage, does validation reject it, or does it land in the warehouse looking fine?

"The consumer, three days later" is not a pipeline. It's a liability with a schedule.

§ 04 · The Notebook Question Notebooks aren't the enemy — promotion-without-refactor is

None of this is an argument against notebooks. Notebooks are the right tool for what they're for: exploration. The tight write-run-see loop, inline plots, poking at a dataset until you understand its shape — a notebook is genuinely better than a .py file for that work, and every good pipeline starts life as one. The exploration is real engineering, not a lesser warm-up act.

The sin isn't writing a notebook. It's promoting the notebook to production unchanged and calling it shipped.

There's hard evidence for why that fails.

Reproducibility · GitHub at scale What happens when you re-run a million notebooks?

Notebooks analysed Pimentel et al., 1.16M from GitHub

1.16M100%

Ran top-to-bottom without error in a clean environment

~278K24%

Reproduced their own results same numbers, same charts

~46K4%

Figure 04 Pimentel et al. analysed 1.16M Jupyter notebooks. Only 24% ran cleanly; about 4% reproduced their original results.

The cause is the notebook's own nature: hidden kernel state and out-of-order cell execution mean a notebook that "works" on your screen often can't be cleanly re-run by anyone, including future you. That's fine for exploration. It's disqualifying for a system that has to run unattended, the same way, every night. (Martin Fowler's Thoughtworks team makes the same case from the architecture side: notebooks couple presentation, logic, and data into one file and invite manual tinkering — the opposite of what production needs.)

So the professional move, the day exploration code is headed for prod, is to say the refactor out loud. Not "I'll just productionize the notebook" — but "the notebook proved the logic; now I owe the team a refactor into a tested, validated, idempotent pipeline." Naming that debt is what a senior engineer does. Hiding it inside a copied-over .ipynb is how you become the Unity case study.

§ 05 · The Closing Image A pipeline is a logistics shipment

There's merchandise (the data), a tracking number (lineage), a delivery window (the schedule), and a recipient who's promised exactly what's arriving (the schema). All of it contracted, all of it traceable, all of it accountable when something goes wrong.

Closing image · the parcel One artifact, two versions

Figure 05 A pipeline is a contracted shipment. Without the paperwork, it's smuggling.

Run the same shipment with no manifest, no tracking, and no one expecting it at the other end, and you don't have logistics. You have smuggling — and it works right up until the moment it very expensively doesn't.

The one thing to remember

Before you ship: write the four questions. Run the checklist. Ask who finds out at 3 a.m.

If you can answer all three, you built a pipeline. If you can't, you still have time to — which is the entire reason to ask now, instead of in an earnings call.

Sources

Pimentel, J.F. et al. A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. 1.16M notebooks; 24% ran without error, ~4% reproduced results. IEEE/MSR 2019; PMC, 2021.
Reis, J. & Housley, M. Fundamentals of Data Engineering. O'Reilly, 2022 — idempotency, retries, and lineage as core operational undercurrents.
Martin Fowler / Thoughtworks. Don't put data science notebooks into production. martinfowler.com, updated Nov 2020.
Monte Carlo / Wakefield Research. 2023 State of Data Quality Survey — 74% report business stakeholders find issues first. BusinessWire / TDWI, May 2023.
IBM Institute for Business Value and Unity Q1 2022 earnings reporting — ~$110M revenue impact from ingested bad data. IBM Think; The Motley Fool, May 2022.
Pydantic. v2, Rust-based validation core. pydantic.dev / GitHub, current.
King, A. Parse, Don't Validate. lexi-lambda.github.io, Nov 2019.

Your script runs. That doesn't make it a pipeline.

§ 01 · The Contract A pipeline is a contract, not a script — the four questions

Pipeline contract · v1 The four questions, before the code

§ 02 · The Checklist The production-readiness checklist

Pre-flight · production-readiness Run it down before you call it done

I · Reproducible

II · Trustworthy

III · Operable

§ 03 · The 3 a.m. Test When it breaks, who finds out — and when?

Scenario · same failure, two outcomes 03:00 · the source schema changes

$110M

§ 04 · The Notebook Question Notebooks aren't the enemy — promotion-without-refactor is

Reproducibility · GitHub at scale What happens when you re-run a million notebooks?

§ 05 · The Closing Image A pipeline is a logistics shipment

Closing image · the parcel One artifact, two versions

Sources

§ 01 · The Contract A pipeline is a contract, not a script — the four questions

§ 02 · The Checklist The production-readiness checklist

I · Reproducible

II · Trustworthy

III · Operable

§ 03 · The 3 a.m. Test When it breaks, who finds out — and when?

$110M

§ 04 · The Notebook Question Notebooks aren't the enemy — promotion-without-refactor is

§ 05 · The Closing Image A pipeline is a logistics shipment

Sources

Get new essays in your inbox.