CONCEPTS/THEORY·11 min read

Endogenous vs Reality vs Opinion Data: The Three-Source Axis

A thesis built on one data source is brittle. A thesis built on three is defensible. The three-source axis is the framework for keeping yourself honest about which kind of data you are actually reasoning from.

By Patrick LiuApril 9, 2026

Prerequisites

The most expensive mistake I have made in prediction markets did not come from a bad model or a buggy data pipeline. It came from spending two weeks getting more and more confident in a thesis that was entirely supported by one kind of data — and then losing the position when a second kind of data showed up that contradicted it. The thesis was internally consistent. The data was real. The mistake was that I had collapsed three independent sources into one and convinced myself I had triangulated.

This essay is about the three sources, why they are independent, and why every prediction-market thesis you build needs to draw from all three before you size into it.

The Three Sources

There are exactly three kinds of data that bear on whether a prediction-market contract is correctly priced. Each one has its own production process, its own latency, its own cost, its own failure modes. They are independent in the sense that knowing one tells you very little about the other two — which is what makes triangulating across them informative.

Reality data is what is actually happening in the world. The Bureau of Labor Statistics releases a jobs number. The Kalshi resolution API publishes an outcome. A weather sensor records 47°F. A press release goes out. These are facts about the world that exist independently of whether anyone is observing them. Reality data is authoritative — by definition, it is what the contract will eventually resolve against — but it is also slow (you wait for the release schedule), expensive to gather (some of it requires direct sensor access or paid feeds), and most importantly the contract is priced on the expectation of the data, not the data itself. By the time reality data exists, the trade is over.

Endogenous data is what the prediction market itself reveals. The mid-price, the bid-ask depth, the trade tape, the position-implied velocity, the cycle clustering — everything in the pm-indicator-stack is endogenous data. It is fast (millisecond latency on most venues), cheap (the venue publishes it for free), and continuously updating. It is also recursive in a specific dangerous way: the market reacts to its own price. A move in the price causes some traders to exit and others to enter, which causes more moves, which causes more exits and entries. If you are reasoning purely from endogenous data, you are reading the market's reaction to itself, and you can convince yourself of almost anything because the data flatters whatever pattern you are looking for.

Opinion data is what other humans think. X posts, news articles, Reddit threads, Substack analyses, expert commentary, even private conversations. It is messy, biased, sometimes leading and sometimes lagging, frequently contradictory. It is also the only data source that has information about the world that is not yet a fact. Opinion data is where you find rumors, forecasts, model outputs, leaks, and speculation — which is to say, where you find the substance of the thesis that the market will eventually price in or reject. The hard part about opinion data is that it is full of noise, and the techniques for filtering the noise (source credibility, base rates, calibration history) are the same techniques you would use for any open-source intelligence problem.

Why the Three Are Independent

The framework only works if the sources are genuinely uncorrelated for any specific question. Let me be precise about what I mean.

A jobs report (reality) is generated by BLS surveying employers. The market price of a "non-farm payrolls > 200K" contract (endogenous) is generated by traders bidding against each other. A Bloomberg piece predicting the report (opinion) is generated by a journalist talking to economists. These three production processes share no inputs and run on completely different schedules. The reality process runs once a month. The endogenous process runs continuously. The opinion process runs whenever a journalist files copy. There is no mechanical way for one to force-update the others.

That is what "independent" means here. It does not mean uncorrelated in the statistical sense — over long horizons, all three converge on the same underlying reality, and they had better, or the prediction market is broken. It means you cannot derive one from the other two. If you could, you would have one data source, not three.

The independence is what makes triangulation work. If reality data and endogenous data and opinion data all point the same way, you have three independent witnesses agreeing, and your confidence should be high. If two agree and one disagrees, that is interesting and you should figure out which one is wrong. If all three disagree, you almost certainly do not understand the question yet and you should stay out.

The Most Common Failure Mode: Endogenous-Only

The single most common mistake I see in prediction-market traders — including, repeatedly, in myself — is reasoning entirely from endogenous data. The seduction is that endogenous data is right there, in real time, with perfect resolution, and the market keeps rewarding you for reading it (until it doesn't).

A typical endogenous-only thesis looks like this: "The price has moved from 0.42 to 0.55 over the last week, and the move has accelerated each day. The orderbook has thickened on the YES side. CRI is rising. PIV is positive. The market is clearly converging on YES. I am long." That is a coherent reading of the data. It is also a thesis built entirely on the market's reaction to itself, and it does not contain a single fact about the world, a single forecast from a non-trader, or any reason to believe that the underlying event has changed.

The failure mode is that endogenous data lies confidently. When a market moves up because traders see other traders moving up, the price goes up. Indicators register the move. Velocity registers the acceleration. Every endogenous signal you would normally trust says "this is a real move." And then a single piece of reality data shows up — a jobs number, a court ruling, an actual outcome in the world — and the market collapses by twenty cents in an afternoon. The endogenous signals were never wrong about themselves; they were just measuring the wrong thing. They were measuring momentum, and you thought you were measuring belief.

The second most common failure is opinion-only. This one is more obviously dangerous and people make it less often, but it still happens: someone reads a viral Twitter thread predicting an outcome, sizes a position based on the thread, ignores the price (which is already moving in the opposite direction because actual traders are already pricing in the counter-evidence), and gets stopped out within a day. Opinion data without an endogenous check is how you become exit liquidity for the people who read the same thread you did.

Reality-only is the third failure mode and it is the rarest because almost nobody trades purely on releases. The variant I see is "I know what the BLS print is going to be, therefore I know where the contract should be priced." That can be true, but only if the market does not also know — and if you trust the market at all, you have to assume that the price already reflects most public guesses about the print. Reality-only without an endogenous check is how you trade on consensus information and lose to the people who already moved the price.

A Triangulation Walkthrough

Here is a real-shape example. Last fall I was looking at a Kalshi contract for whether the Fed would cut rates at the November meeting. YES was trading at $0.48 with about 30 days to expiry. IY was healthy. The orderbook was deep enough.

Endogenous data: the price had drifted up from 0.41to0.41 to 0.48 over two weeks. CRI was modest. PIV showed a slight positive bias on YES. Sibling contracts (other rate-decision markets) were also drifting up. The endogenous read said: market is converging on a cut.

Reality data: the most recent CPI print was lower than expected, the most recent jobs print was slightly weaker, GDP nowcasts were softening. None of this was conclusive — the data could support either decision — but on the margin it leaned dovish. The reality read said: a cut is consistent with the macro trajectory.

Opinion data: the Fed-watching ecosystem (Wall Street economists, FedGuy on Twitter, the WSJ Fed whisperer pieces) was split. Maybe 60% of the takes I respected were leaning toward a cut, 40% toward a hold. The most credible voices were specifically calling out Powell's recent emphasis on "data dependence" as a hedge — which is signal that even the credible voices were not sure.

Three sources. Two of them (endogenous + reality) said cut. One of them (opinion, the most credible part of it) said the credible forecasters were not committing. That is a triangulation pattern that says "do not size into this." The endogenous and reality data were both consistent with a cut, but neither of them was forcing a cut, and the opinion data was telling me the people who actually knew were hedging.

I sized small. Smaller than I would have if all three sources had aligned. The Fed held. I lost a small position instead of a large one. The triangulation kept me out of trouble even though two of three sources looked supportive.

Counter-example: in the same window, I had a contract on a state ballot initiative where reality data was strong (the polling was consistently 8 points in one direction), endogenous data was strong (the price reflected the polling and was stable), and opinion data was strong (the local journalism and the political insiders all agreed). Three sources aligned. I sized big and the position resolved cleanly.

The framework is not "always trade when three sources agree." It is "never size big when fewer than three sources agree." That asymmetry is the whole value of having three sources.

Where the Framework Breaks

A few cases where the three-source axis is harder to apply than the abstract version makes it sound.

Some questions have no opinion layer. If you are trading a niche Kalshi contract on weather extremes in some specific city, there is essentially no opinion data — nobody is writing think pieces about it. You are stuck with reality (weather forecasts) and endogenous (the market price) only. The framework still works, but you have to be more conservative because you have lost one of your three witnesses.

Some questions have no reality layer until resolution. The classic example is "Will event X happen by date Y." Until the resolution event actually fires, there is no reality data — only forecasts. The contract is priced entirely on a mix of endogenous and opinion data, with reality only arriving at the moment of resolution. This is the structural shape of all event contracts on rare or first-time events, and it is part of why those markets are notoriously hard to read.

Sources can collapse without you noticing. If your "opinion data" feed is dominated by other prediction-market traders who themselves are reading the same endogenous signal, your opinion source has collapsed into your endogenous source and you have not realized it. This happens constantly on Crypto-Twitter and PM-Twitter, where the discourse layer is downstream of the price tape. Always ask whether the opinions you are reading are upstream of the market or downstream of it.

Reality data can be wrong. BLS revisions are real. Polling errors are real. Sensor failures are real. Treating reality data as ground truth is usually fine, but the assumption fails often enough that you should treat your reality source the same way you treat any single witness — verify against the other two when you can.

How This Connects to the Stack

The three-source axis lives inside stage 3 of the valuation funnel. Stages 1 and 2 are pure endogenous-data work — you are screening on indicators and reading orderbooks, both of which are entirely endogenous. Stage 3 is where you bring reality and opinion data into the conversation, and the entire purpose of stage 3 is to make sure your stage-1 candidate survives contact with the other two sources.

The pm-indicator-stack is endogenous-only by construction. That is its design — it is the cheap fast filter that has to scan 47,000 markets in milliseconds, and you cannot do that with anything but venue-published data. The indicator stack is not a substitute for the three-source check; it is the precondition for being able to do the three-source check on a small enough number of candidates that you actually have time to do it.

The thesis-driven-prediction-market-strategy opinion piece argues for the same idea from a different angle: a "thesis" is precisely the artifact you produce when you have done the three-source triangulation and committed to a reading. A trade without a thesis is a trade with at most two sources reconciled, and that is the failure mode this framework exists to prevent.

The capstone prediction-market-valuation-theory closes the loop: the reason the prediction market is interesting at all is that it is the only forum where these three sources collide in a single price. That collision is what makes the price informative — and reading the price without reading the collision is reading half the data.

The framework is older than prediction markets. Every intelligence community in the world runs some version of "single-source / two-source / three-source" assessments and weights conclusions accordingly. Every credit analyst runs management commentary against financial statements against industry data. Prediction markets are unusual only in that all three sources are nominally available to a retail trader for free — which is a privilege, and if you do not use it you are voluntarily flying with one eye closed.

Related Work

three-source-axisdata-strategythesistheoryframework