The market drops two percent, and by 5:30 p.m. someone in a suit explains exactly why. Rate fears. Weak China data. Profit-taking. The explanation is fluent, it sounds smart, it fits perfectly. Had the market risen two percent, the same person would have explained the opposite with the same conviction — rate hopes, robust China data, bargain hunters.
That is not an accident. That is the business model. After the event, there is always a reason. Before it, almost never.
The hindsight story always fits
Nassim Taleb calls this the narrative fallacy: our mind can't stand a random sequence, so it lays a chain of causes over it after the fact. The world becomes a story, and stories have reasons. The problem isn't that the explanations are wrong — it's that they're unfalsifiable. They cost nothing, because they only appear once the outcome is already known.
A real prediction is expensive. It has to commit before the data arrives, and it can fail embarrassingly. An explanation never can. That is exactly why the world is full of the one and starved of the other.
Explaining means: I find a story afterwards that fits. Predicting means: I commit beforehand and can be wrong. Only the second is testable — and the second is precisely what most people avoid.
So the honest test of a claimed relationship is never "does that sound plausible?" and not even "did it correlate in the past?" It is: would it have helped me on data I hadn't seen yet?
HINDSIGHT (in-sample): every point gets its story -> "obviously, because X"
PREDICTION (out-of-sample): the next, unseen point -> mostly a coin flip
So I stopped talking and started computing
Instead of getting annoyed by it, I started a small build-in-public project: an Economic Dependency Atlas. The idea: take the relationships everyone treats as obvious — "gas prices drive the chemical industry," "copper leads car production," "the yield curve predicts recessions" — and test them against a single, incorruptible rule.
The rule is the one above. Concretely:
- Pick the lead time at which a signal is supposed to run ahead using past data only — no peeking forward.
- Then predict out-of-sample: does the model with the "leading indicator" beat a dumb baseline ("tomorrow is roughly like today") on data it never saw while learning?
- Report the result as a confidence interval, not as one pretty number. Point estimates lie.
- And everything that fails goes on a public hypothesis graveyard. Failed predictions otherwise vanish quietly — that is the silent evidence Taleb writes about.
All on freely available, public monthly data (Eurostat, FRED, OECD). Nothing exotic.
The first result was already telling
Gas to chemicals, the classic. In hindsight it looks great: a common causality test (Granger) fires with a p-value of 0.002 — the kind of number that means "confirmed" in a paper.
Out-of-sample? Nothing. The forecasting gain over the dumb baseline is zero, the confidence interval cleanly straddles zero. It didn't help before the 2022 energy crisis either (so it's not a crisis artifact), and it didn't help priced in euros instead of dollars. The famous relationship is significant in-sample and worthless out-of-sample. Exactly the gap between explaining and predicting — in one number.
Then I widened it. More than a dozen pairs, several specifications, lead times from one to twelve months. The hard cases:
| Claimed leading indicator | In hindsight | Out-of-sample |
|---|---|---|
| Gas price -> chemical production | Granger p = 0.002, "significant" | no edge |
| Copper -> car production | cointegrated (p = 0.003) | no edge |
| Business sentiment / orders -> production | clearly correlated | no edge |
| US yield curve -> US industrial production | the textbook recession signal | no edge |
| OECD leading indicator -> production | built to run ahead | no edge |
Survivors: zero.
Two cases sting in particular. Copper and car production are cointegrated — the statistical holy grail, a shared long-run trend. And still: no forecasting power whatsoever. Sharing a trend does not mean one leads the other.
And the OECD Composite Leading Indicator: an index economists built specifically to run ahead of the business cycle. If anything had to pass the test, it was this. It doesn't — neither one month nor twelve months ahead. When even the tool built for exactly this job fails, the problem isn't the tool. It's the expectation.
Why almost everything fails
Three sober reasons, no mysticism:
- Correlations are everywhere in hindsight. Hold enough commodities against enough industries against enough lead times, and you will find beautiful relationships — by pure chance. Taleb calls the victims fooled by randomness. The out-of-sample test is the antidote, because randomness doesn't sit in the same spot twice.
- The dumb baseline is brutally good. Saying "next month roughly like this one" is astonishingly hard to beat. Most of the predictable part lives in a series' own recent path — not in some external driver.
- The world changes regime. A relationship that held in 2010 needn't hold in 2022. A single pinned-down model rarely transfers.
And now the lens against myself
Here would be the convenient conclusion: "It's all nonsense, nobody can predict anything." But that would be the same mistake in green — an overconfident claim, just with the sign flipped.
So, honestly: my test has limits. Over long horizons the samples are small and the confidence intervals correspondingly wide — I simply have little statistical power. I tested linearly, pairwise, on revised (not real-time vintage) data. A complex, multivariate model could extract more.
"Not distinguishable from zero" is not the same as "zero." Absence of evidence is not evidence of absence.
That isn't fine print, it's the core. The goal is not to jump from the naive belief ("experts know") to the cynical belief ("nobody knows anything"). The goal is calibration: holding a belief exactly as strongly as the evidence allows — and not one notch more. A range instead of a point. That applies to the expert's prediction just as much as to my refutation.
What you can take from this
You don't need to run time series for it. A few questions are enough to cut any confident explanation down to size:
- "Would that have worked beforehand?" Would the rule have been a gain on data the explainer didn't yet know — or is it just a nice story about the past?
- "Where's the graveyard?" Whoever shows you their hits should also show their misses. Without the misses, the hit rate is meaningless.
- "How wide is the interval?" A prediction without a range is marketing. An honest one comes with the band within which it can be wrong.
- "Does it even beat the coin flip?" Astonishingly often the sober answer is: no.
The most expensive mistake under uncertainty isn't being wrong. It's turning a good story into a confident bet.
I packed the practice of this kind of thinking into a small tool — Against Certainty, a field guide for thinking under uncertainty, with interactive mini-calculators instead of theory. The Atlas is essentially its empirical stress test: what's left when you actually compute the experts' certainty. So far: surprisingly little — and stated openly, limits and all.



