Explaining Is Easy, Predicting Is Hard — the market test

The market drops two percent, and by 5:30 p.m. someone in a suit explains exactly why. Rate fears. Weak China data. Profit-taking. The explanation is fluent, it sounds smart, it fits perfectly. Had the market risen two percent, the same person would have explained the opposite with the same conviction — rate hopes, robust China data, bargain hunters.

That is not an accident. That is the business model. After the event, there is always a reason. Before it, almost never.

The hindsight story always fits

Nassim Taleb calls this the narrative fallacy: our mind can't stand a random sequence, so it lays a chain of causes over it after the fact. The world becomes a story, and stories have reasons. The problem isn't that the explanations are wrong — it's that they're unfalsifiable. They cost nothing, because they only appear once the outcome is already known.

A real prediction is expensive. It has to commit before the data arrives, and it can fail embarrassingly. An explanation never can. That is exactly why the world is full of the one and starved of the other.

Explaining means: I find a story afterwards that fits. Predicting means: I commit beforehand and can be wrong. Only the second is testable — and the second is precisely what most people avoid.

So the honest test of a claimed relationship is never "does that sound plausible?" and not even "did it correlate in the past?" It is: would it have helped me on data I hadn't seen yet?

HINDSIGHT (in-sample):       every point gets its story   ->  "obviously, because X"
PREDICTION (out-of-sample):  the next, unseen point       ->  mostly a coin flip

So I stopped talking and started computing

Instead of getting annoyed by it, I started a small build-in-public project: an Economic Dependency Atlas. The idea: take the relationships everyone treats as obvious — "gas prices drive the chemical industry," "copper leads car production," "the yield curve predicts recessions" — and test them against a single, incorruptible rule.

The rule is the one above. Concretely:

Pick the lead time at which a signal is supposed to run ahead using past data only — no peeking forward.
Then predict out-of-sample: does the model with the "leading indicator" beat a dumb baseline ("tomorrow is roughly like today") on data it never saw while learning?
Report the result as a confidence interval, not as one pretty number. Point estimates lie.
And everything that fails goes on a public hypothesis graveyard. Failed predictions otherwise vanish quietly — that is the silent evidence Taleb writes about.

All on freely available, public monthly data (Eurostat, FRED, OECD). Nothing exotic.

The first result was already telling

Gas to chemicals, the classic. In hindsight it looks great: a common causality test (Granger) fires with a p-value of 0.002 — the kind of number that means "confirmed" in a paper.

Out-of-sample? Nothing. The forecasting gain over the dumb baseline is zero, the confidence interval cleanly straddles zero. It didn't help before the 2022 energy crisis either (so it's not a crisis artifact), and it didn't help priced in euros instead of dollars. The famous relationship is significant in-sample and worthless out-of-sample. Exactly the gap between explaining and predicting — in one number.

Then I widened it. More than a dozen pairs, several specifications, lead times from one to twelve months. The hard cases:

Claimed leading indicator	In hindsight	Out-of-sample
Gas price -> chemical production	Granger p = 0.002, "significant"	no edge
Copper -> car production	cointegrated (p = 0.003)	no edge
Business sentiment / orders -> production	clearly correlated	no edge
US yield curve -> US industrial production	the textbook recession signal	no edge
OECD leading indicator -> production	built to run ahead	no edge

Survivors: zero.

Two cases sting in particular. Copper and car production are cointegrated — the statistical holy grail, a shared long-run trend. And still: no forecasting power whatsoever. Sharing a trend does not mean one leads the other.

And the OECD Composite Leading Indicator: an index economists built specifically to run ahead of the business cycle. If anything had to pass the test, it was this. It doesn't — neither one month nor twelve months ahead. When even the tool built for exactly this job fails, the problem isn't the tool. It's the expectation.

Why almost everything fails

Three sober reasons, no mysticism:

Correlations are everywhere in hindsight. Hold enough commodities against enough industries against enough lead times, and you will find beautiful relationships — by pure chance. Taleb calls the victims fooled by randomness. The out-of-sample test is the antidote, because randomness doesn't sit in the same spot twice.
The dumb baseline is brutally good. Saying "next month roughly like this one" is astonishingly hard to beat. Most of the predictable part lives in a series' own recent path — not in some external driver.
The world changes regime. A relationship that held in 2010 needn't hold in 2022. A single pinned-down model rarely transfers.

And now the lens against myself

Here would be the convenient conclusion: "It's all nonsense, nobody can predict anything." But that would be the same mistake in green — an overconfident claim, just with the sign flipped.

So, honestly: my test has limits. Over long horizons the samples are small and the confidence intervals correspondingly wide — I simply have little statistical power. I tested linearly, pairwise, on revised (not real-time vintage) data. A complex, multivariate model could extract more.

"Not distinguishable from zero" is not the same as "zero." Absence of evidence is not evidence of absence.

That isn't fine print, it's the core. The goal is not to jump from the naive belief ("experts know") to the cynical belief ("nobody knows anything"). The goal is calibration: holding a belief exactly as strongly as the evidence allows — and not one notch more. A range instead of a point. That applies to the expert's prediction just as much as to my refutation.

What you can take from this

You don't need to run time series for it. A few questions are enough to cut any confident explanation down to size:

"Would that have worked beforehand?" Would the rule have been a gain on data the explainer didn't yet know — or is it just a nice story about the past?
"Where's the graveyard?" Whoever shows you their hits should also show their misses. Without the misses, the hit rate is meaningless.
"How wide is the interval?" A prediction without a range is marketing. An honest one comes with the band within which it can be wrong.
"Does it even beat the coin flip?" Astonishingly often the sober answer is: no.

The most expensive mistake under uncertainty isn't being wrong. It's turning a good story into a confident bet.

I packed the practice of this kind of thinking into a small tool — Against Certainty, a field guide for thinking under uncertainty, with interactive mini-calculators instead of theory. The Atlas is essentially its empirical stress test: what's left when you actually compute the experts' certainty. So far: surprisingly little — and stated openly, limits and all.

That is not an accident. That is the business model. After the event, there is always a reason. Before it, almost never.

The hindsight story always fits

Explaining means: I find a story afterwards that fits. Predicting means: I commit beforehand and can be wrong. Only the second is testable — and the second is precisely what most people avoid.

So the honest test of a claimed relationship is never "does that sound plausible?" and not even "did it correlate in the past?" It is: would it have helped me on data I hadn't seen yet?

HINDSIGHT (in-sample):       every point gets its story   ->  "obviously, because X"
PREDICTION (out-of-sample):  the next, unseen point       ->  mostly a coin flip

So I stopped talking and started computing

The rule is the one above. Concretely:

Pick the lead time at which a signal is supposed to run ahead using past data only — no peeking forward.
Then predict out-of-sample: does the model with the "leading indicator" beat a dumb baseline ("tomorrow is roughly like today") on data it never saw while learning?
Report the result as a confidence interval, not as one pretty number. Point estimates lie.
And everything that fails goes on a public hypothesis graveyard. Failed predictions otherwise vanish quietly — that is the silent evidence Taleb writes about.

All on freely available, public monthly data (Eurostat, FRED, OECD). Nothing exotic.

The first result was already telling

Gas to chemicals, the classic. In hindsight it looks great: a common causality test (Granger) fires with a p-value of 0.002 — the kind of number that means "confirmed" in a paper.

Then I widened it. More than a dozen pairs, several specifications, lead times from one to twelve months. The hard cases:

Claimed leading indicator	In hindsight	Out-of-sample
Gas price -> chemical production	Granger p = 0.002, "significant"	no edge
Copper -> car production	cointegrated (p = 0.003)	no edge
Business sentiment / orders -> production	clearly correlated	no edge
US yield curve -> US industrial production	the textbook recession signal	no edge
OECD leading indicator -> production	built to run ahead	no edge

Survivors: zero.

Why almost everything fails

Three sober reasons, no mysticism:

Correlations are everywhere in hindsight. Hold enough commodities against enough industries against enough lead times, and you will find beautiful relationships — by pure chance. Taleb calls the victims fooled by randomness. The out-of-sample test is the antidote, because randomness doesn't sit in the same spot twice.
The dumb baseline is brutally good. Saying "next month roughly like this one" is astonishingly hard to beat. Most of the predictable part lives in a series' own recent path — not in some external driver.
The world changes regime. A relationship that held in 2010 needn't hold in 2022. A single pinned-down model rarely transfers.

And now the lens against myself

Here would be the convenient conclusion: "It's all nonsense, nobody can predict anything." But that would be the same mistake in green — an overconfident claim, just with the sign flipped.

"Not distinguishable from zero" is not the same as "zero." Absence of evidence is not evidence of absence.

What you can take from this

You don't need to run time series for it. A few questions are enough to cut any confident explanation down to size:

"Would that have worked beforehand?" Would the rule have been a gain on data the explainer didn't yet know — or is it just a nice story about the past?
"Where's the graveyard?" Whoever shows you their hits should also show their misses. Without the misses, the hit rate is meaningless.
"How wide is the interval?" A prediction without a range is marketing. An honest one comes with the band within which it can be wrong.
"Does it even beat the coin flip?" Astonishingly often the sober answer is: no.

The most expensive mistake under uncertainty isn't being wrong. It's turning a good story into a confident bet.

Explaining Is Easy, Predicting Is Hard: I Back-Tested the Famous Economic Links

The hindsight story always fits

So I stopped talking and started computing

The first result was already telling

Why almost everything fails

And now the lens against myself

What you can take from this

Related

Related Articles

Why AI Isn't Interesting Yet — The Bill Hasn't Landed

Fear Is Not a Probability: Better Decisions Under Uncertainty

Over Budget, Behind Schedule, Off Target: Why Software Projects Fail

Let's talk.

Explaining Is Easy, Predicting Is Hard: I Back-Tested the Famous Economic Links

The hindsight story always fits

So I stopped talking and started computing

The first result was already telling

Why almost everything fails

And now the lens against myself

What you can take from this

Related

Related Articles

Why AI Isn't Interesting Yet — The Bill Hasn't Landed

Fear Is Not a Probability: Better Decisions Under Uncertainty

Over Budget, Behind Schedule, Off Target: Why Software Projects Fail

Let's talk.