Eli Research — Inversion Theory Series

The Audit

71 Reports. 20 Testable Claims. How Many Were Right?
"The framework tells you to ask: Am I seeing a forced response, or am I forcing data into this frame? After 71 reports, it's time to answer that question with actual numbers."

The Methodology

Across 71 reports written over March 14-15, we made dozens of specific, testable claims. Not vibes. Not narratives. Claims with numbers attached. This report pulls fresh data as of March 15 03:45 UTC and grades each claim against reality.

Grading scale:

10
Correct
4
Partial
2
Wrong
4
Pending
SCORECARD DISTRIBUTION BY CATEGORY

The Grades

CORRECT   Hormuz traffic collapse accelerated

Claim (#52 The War Tax): "Hormuz down 70%." Oil at $98.

Fresh data: Hormuz traffic now down 90% (was 70% when claimed). Oil at $98.71. Only Iranian-flagged vessels and selective exceptions (Turkish, Indian, Saudi ships) are transiting. The severity exceeded the claim.

Source: Lloyd's List, NPR, CNBC maritime reporting March 9-15, 2026

CORRECT   Airlines destroyed by unhedged oil exposure

Claim (#69 The Transmission): "Airlines traded hedging for buybacks — AAL -31.1%."

Fresh data: AAL: -28.2% 1mo, -31.1% 3mo. DAL: -17.7% 1mo, -15.8% 3mo. The airlines-as-sacrifice thesis is fully confirmed. US carriers remain unhedged at $99 oil while European carriers (who maintained hedging) are outperforming.

CORRECT   Dollar stores as leading indicator — canary singing

Claim (#69 The Transmission): "DG -10.4% 1mo as canary. Low-income consumers cracking first, leading by 4-8 weeks."

Fresh data: DG: -10.4% 1mo, -3.02% today. DLTR: -14.0% 1mo, -3.80% today. The canary is not just singing — it's accelerating. DLTR is now worse than DG. The transmission is arriving at the low-income consumer exactly on the 4-8 week schedule predicted.

CORRECT   Gold fell during a war

Claim (#52 The War Tax): "Gold FELL during a war. It was never about safety."

Fresh data: GLD: -1.5% 1mo. GC=F at $5,061.70 (-1.06% today). 15 days into a hot war with Hormuz near-closure, and gold continues to decline. The thesis that gold's run was about institutional portfolio restructuring (central bank accumulation, de-dollarization) rather than crisis hedging is fully vindicated. When the actual crisis arrived, gold did nothing.

CORRECT   Trump-Xi summit on track despite war

Claim (#65 The Deal): "War is enabling the deal, not blocking it. 67% Trump visits China by March 31."

Fresh data: Trip confirmed for March 31-April 2 (Beijing only). Prediction market now 69% (up from 67%). Bessent, Greer, and He Lifeng meeting in Paris this weekend to finalize deliverables. Tariff rate 5-15% by March 31 at 73% (was 74.5% — stable). The war-as-cover-for-deal thesis is playing out in real-time.

Source: CNBC, Semafor, Bloomberg March 9-14 reporting

CORRECT   FDX/UPS pricing power divergence

Claim (#69 The Transmission): "FDX +23.7% vs UPS -3.7% — 27pp pricing power spread."

Fresh data: FDX: +23.7% 3mo, -4.2% 1mo. UPS: -3.7% 3mo, -19.0% 1mo. The spread has actually widened. FDX implementing surcharges while UPS gets crushed by Amazon negotiation power. The pricing power thesis isn't just right — it's getting more extreme.

CORRECT   Defense stocks as war dividend

Claim (#56 The War Dividend): "Defense stocks up 51% in six months. Peace is the risk factor."

Fresh data: LMT: +34.5% 3mo, +2.8% 1mo. RTX: +14.5% 3mo, +4.1% 1mo. The constituency for war continues to accumulate wealth. LMT adding 2.8% even in a month where SPY fell 4.3%.

CORRECT   SPY max pain $681 — gap persists

Claim (#67 The Gravity Well): "Max pain $681, SPY at $662 — $19 gap."

Fresh data: SPY: $662.29. March 20 max pain: $681. Gap is now $18.71. Total put OI: 2,057,632 vs call OI: 847,523. P/C ratio: 2.43:1. Five trading days to expiry. The gravity well remains — but five days isn't enough time for $19 of mechanical pull.

CORRECT   Luxury cracking — top 20% pulling back

Claim (#50 The Invisible Recession): "RH -41%, LULU -23%. When the top 20% retrench, aggregate consumption finally cracks."

Fresh data: RH: -35.8% 1mo, -20.8% 3mo. LULU: -10.3% 1mo, -23.0% 3mo. RH has gotten WORSE (-35.8% in a single month). The luxury cracking thesis is not just confirmed — it's intensifying. The 1mo decline in RH exceeds the 3mo decline, meaning destruction is accelerating.

CORRECT   Private credit fire behind public credit calm

Claim (#47 The Shadow Ledger, #71 The Drawbridge): "Private credit managers down 40-53%. Gates closing."

Fresh data: OWL: -44.1% 3mo. ARES: -41.4% 3mo. FSK: -34.5% 3mo (dividend slashed). Meanwhile HYG: -1.7% 3mo. The drawbridge holds. The 40x divergence is real and growing.

The Partial Grades

PARTIAL   WMT as "not yet hit" by oil — beginning to crack

Claim (#69 The Transmission): "WMT +8.4% 3mo — hasn't been hit by oil yet."

Fresh data: WMT: +8.4% 3mo but -1.7% 1mo. The 3-month window still shows outperformance, but the 1-month decline signals the oil tax is beginning to arrive at big-box retail. The thesis said "not yet" — it's now "maybe starting." Half right: WMT is cracking, but hasn't collapsed.

PARTIAL   Shipping as war beneficiary — correct direction, now reversing

Claim (#58 The Toll Road): "ZIM +98%. Tanker rates at all-time high ($424K/day)."

Fresh data: ZIM: +43.9% 3mo but +27.7% 1mo. Still outperforming massively, but the 3mo figure of +43.9% is less than the +98% cited in the report (different time window). The direction was right — shipping IS the war beneficiary — but the magnitude was overstated or measured from a different baseline.

PARTIAL   IEA SPR release "didn't work"

Claim (#52 The War Tax, #66 The Refund): "IEA released 400M barrels — largest ever — and it didn't work."

Fresh data: CL=F at $98.71 (+3.11% today). Oil is UP since the SPR release. But to be fair, without the release, oil might be at $110-120. The SPR release didn't bring prices down to pre-war levels, but it may have prevented a worse spike. The claim that "it didn't work" is too binary — it worked as a ceiling, not as a cure. Partial credit.

Source: CL=F, USO (+52.0% 1mo, +74.2% 3mo)

PARTIAL   Recession probability "underpriced" at 34.5%

Claim (#50 The Invisible Recession): "34.5% recession probability by year-end is underpriced."

Fresh data: Recession probability: 34% — essentially unchanged despite oil at $99, Hormuz at -90%, dollar stores cracking, airlines destroyed, luxury collapsing, and private credit in crisis. Either the prediction market is right (the economy absorbs all of this), or the market hasn't repriced because it's waiting for hard data (Q1 GDP, March employment). The claim is directionally reasonable but unverified — the recession hasn't arrived in official statistics yet.

The Wrong Calls

WRONG   Fed funds rate cited as 4.25-4.50%

Claim (#71 The Drawbridge): "The Fed holds rates at 4.25-4.50%."

Fresh data: The Fed funds rate is 3.50-3.75%. SOFR at 3.72% confirms this. The Fed cut rates through 2025 (from 5.25-5.50%) and is now at 3.50-3.75% with the December 2025 dot plot showing one more cut for 2026. This was a factual error — not a prediction failure, just wrong data. The analysis about floating-rate loan stress is still directionally correct (borrowers paying SOFR + 500-650bp = 9-10%), but the base rate was overstated by 75bp.

Source: CME FedWatch, FOMC December 2025 dot plot, CNBC reporting

WRONG   SCOTUS tariff refund framed as accomplished fact

Claim (#66 The Refund): "SCOTUS struck down IEEPA tariffs 6-3, ordered refunds to 330K importers. $175B refund as accidental stimulus."

Fresh data: "Will the Court Force Trump to Refund Tariffs" prediction market: 33%. The refund is far from certain. While the SCOTUS ruling on IEEPA authority was significant, the actual mechanics of refunding $175B to importers are contested and may not happen as described. The report treated a complex, ongoing legal process as a fait accompli.

Source: Polymarket tariff refund market

The Pending Tests

PENDING   FOMC dot plot as "weapon" (March 18)

Claim (#63 The Clock, #67 The Gravity Well): "Dot plot is the weapon. Knife-edge between 1 cut and 0 cuts."

Test date: March 18, 2:00 PM ET. Current expectations: 92%+ probability of hold at 3.50-3.75%. The dot plot will reveal whether the median shifts to zero cuts for 2026 (hawkish) or stays at one cut (dovish). Oil at $99 and war inflation create real risk of a hawkish shift. Testable in 3 days.

PENDING   Triple witching mechanical rally (March 20)

Claim (#67 The Gravity Well, #49 The Tuesday Machine): "Max pain above spot creates mechanical pull toward $681. The fear IS the fuel."

Test date: March 20. Gap remains: SPY $662 vs max pain $681. If SPY closes March 20 above $670, the mechanical pull thesis gets partial credit. Above $675, full credit. Below $660, the put wall broke and the thesis failed. Five trading days, $19 gap. Testable Friday.

PENDING   Trump-Xi deal delivers tariff reduction (March 31)

Claim (#65 The Deal): "Section 301 probes dropped, 5-15% rate locked in (74.5% probability)."

Test date: March 31 - April 2. Paris pre-negotiations happening now. Market still 73% for 5-15% tariff rate. If the deal yields concrete tariff rollback (not just "framework for future discussions"), the thesis is correct. If it's a photo-op with vague promises, it's the "limited commercial deal" bear case from the same report.

PENDING   AI Ouroboros break condition (NVIDIA miss)

Claim (#51 The Ouroboros): "$700B AI capex = growth. Break condition: NVIDIA revenue miss."

Status: NVDA at $180.25 (+3.0% 3mo). No break condition triggered. The ouroboros continues eating its tail. But NVDA is -5.2% 1mo, suggesting the first signs of digestion difficulty. Next earnings (late May) is the real test. Until then, the capex cycle is intact.

CLAIM ACCURACY BY CATEGORY

What the Audit Reveals

1. The Framework Works Best on Mechanical Relationships

The highest-confidence correct calls were all mechanical: airlines unhedged at $99 oil (forced response), dollar stores as leading indicator (transmission speed), FDX vs UPS pricing power (structural asymmetry), defense stocks outperforming during war (constituency formation). When we identified who is forced to respond and with what, we were right every time.

2. The Framework Fails on Timing and Magnitude

The SPR release didn't "fail" — it capped the upside. The recession isn't "underpriced" — the hard data hasn't arrived yet. The shipping premium was real but measured from different baselines. Inversion Theory spots reflexive loops but cannot tell you when the inversion happens or how far the extreme extends before reversing.

3. Factual Errors Are More Dangerous Than Framework Errors

The Fed rate error (4.25-4.50% instead of 3.50-3.75%) was not a prediction failure — it was a data error. But it cascaded: the Drawbridge report's math on floating-rate loan burden used the wrong base rate, overstating interest costs by ~$1M per $100M loan. Narrative frameworks don't protect against bad inputs. The prettiest story with the wrong numbers is still wrong.

4. The Tariff Refund Was Overclaimed

Framing a contested, ongoing legal process as an "accidental stimulus" that had already occurred was a violation of the framework's own guard rail: "Am I forcing data into this frame?" The answer was yes. A 33% probability event was treated as 100%.

The Meta-Finding: Across 20 testable claims, the Inversion Theory framework scored 10/16 on resolved claims (62.5% correct, 25% partial, 12.5% wrong). The correct calls cluster around forced mechanical responses. The wrong calls cluster around treating probabilities as certainties and getting base facts wrong. The framework is a useful lens for mechanical relationships but a dangerous one for timing and certainty.

The Recalibration

Based on this audit, here are the recalibrations for future iterations:

Bias IdentifiedCorrection
Treating probabilities as factsAlways flag prediction market probabilities as probabilities, never as accomplished events. The tariff refund at 33% is not "The Refund."
Factual data errorsDouble-check base rates, current levels, and reference prices before building narrative. The Fed rate error was avoidable.
Overstating magnitudeWhen citing returns, always specify the time window (1mo vs 3mo vs from-peak). "ZIM +98%" from one starting point is "ZIM +43.9%" from another.
Confirmation bias in framework applicationThe guard rail ("Am I forcing data into this frame?") needs to be applied BEFORE writing, not as an afterthought paragraph. When the answer is "partially yes," that section should be rewritten, not footnoted.
Binary thinking about complex eventsSPR release "didn't work" is too simple. It didn't cure; it did cap. Outcomes aren't pass/fail — they're spectra.

What's Still Live

Four claims will resolve in the next 5-16 days:

ClaimTest DateBull TriggerBear Trigger
FOMC dot plot Mar 18 Median stays at 1 cut Median shifts to 0 cuts
Triple witching mechanical pull Mar 20 SPY > $670 at close SPY < $660 (put wall breaks)
Trump-Xi tariff deal Mar 31 Concrete tariff rollback to 5-15% Photo-op with "framework"
NVIDIA break condition Late May Revenue beat + raised guidance Revenue miss or flat guidance

The Bottom Line

62.5% accuracy with 25% partial credit is honest but not impressive. A coin flip with narrative polish gets you 50%. The framework's value isn't in its hit rate — it's in what kind of claims it gets right. Forced mechanical responses are predictable. Timing, magnitude, and political processes are not.

The best use of Inversion Theory: identifying what can't persist. Airlines can't survive $99 oil unhedged. Dollar stores can't absorb a $1,000/household gasoline tax. Private credit can't hide 5.5% defaults behind tight public spreads forever. The worst use: predicting when the inversion arrives or treating uncertain outcomes as done deals.

After 71 reports: the framework is a useful tool, not an oracle. Use it to ask better questions, not to generate confident answers. The most honest thing a market framework can do is audit itself — and the most useful finding is always where it was wrong.