How wrong is the crowd?
On 928 resolved Kalshi and Polymarket markets, the price 24 hours before settlement scored a Brier of 0.213(mediocre). Perfect calibration is 0.000, a coin flip is 0.250. On average, predicted probability sits 7.3pp from the realized rate.
The diagonal in the curve below is perfect calibration. Dots above the line are overconfident YES; dots below are overconfident NO. Bigger dot = more resolutions in that bucket.
Settlement-time Brier 0.187 · hit rate 81.2% · n=60,000. Settlement reflects post-convergence; markets always converge to 0¢ or 100¢ before resolving, so settlement Brier is artificially low — that's why we use t-24h.
Calibration curve · t-24h price → realized YES rate
928 resolutions · 10 buckets
By venue (t-24h)
Top categories · 6 of 6
- Sports0.211·68.2%·n=790
- Economics0.269·69.8%·n=63
- Mentions0.168·81.1%·n=37
- Elections0.112·88.9%·n=9
- Up or Down0.249·11.1%·n=9
- Crypto0.251·62.5%·n=8
SimpleFunctions thesis-driven
In collection phase — 1936 open predictions tracked. Calibration metrics will appear here once enough thesis edges resolve against market outcomes.
Methodology
- Predicted price. The market's YES-side price 24 hours before resolution, captured from our 5-minute price snapshots in
market_indicator_history. Falls back to Polymarket CLOBprices-historyfor older markets. - Settlement price. The last trade at the moment of settlement. Shown as a comparison only — every market converges to 0¢ or 100¢ before resolving, so settlement Brier scores are artificially low.
- Venues. Polymarket via Gamma API and Kalshi via Events API. Multi-leg exotic parlays are excluded.
- Resolution sync. Daily via
/api/cron/sync-resolutions; t-24h prices filled hourly via/api/cron/fill-resolutions-t24h.
Data quality caveats
- Indicator history started 2026-04-08. Markets resolved before this date rely on venue API fallbacks. Kalshi has no historical price API, so pre-4/8 Kalshi markets lack t-24h data.
- Mid-bucket samples (30–70¢) are small relative to extremes. Most prediction markets settle near-certain or near-impossible, not coin-flip — interpret mid-range calibration with caution.
- Kalshi data is dominated by high-frequency crypto and sports markets with many strike-price brackets per event; each bracket is counted as an independent prediction.
Frequently asked
- Are prediction markets accurate?
- On 928 resolved Kalshi and Polymarket markets, the price 24 hours before settlement scored a Brier of 0.213 (mediocre). Perfect calibration is 0.000 and a coin flip is 0.250, so a score in the 0.10–0.20 range indicates the crowd is meaningfully better than chance and broadly well-calibrated, with overconfidence concentrated in mid-range buckets where most resolutions are uncertain.
- What is a Brier score for prediction markets?
- The Brier score is the mean squared error between predicted probability and the realized binary outcome (1 for YES resolved, 0 for NO). Lower is better. Forecasters who say 70% on YES and the market resolves YES contribute (1 - 0.7)² = 0.09; saying 70% and resolving NO contributes 0.49. Averaging over many resolutions gives a single calibration number that rewards both correct direction and right-sized uncertainty.
- Why measure prices 24 hours before resolution instead of at settlement?
- Every prediction market converges to 0¢ or 100¢ as the outcome becomes obvious in the final hours, so settlement-time Brier scores are artificially low and tell you nothing about how well the market actually predicted. The t-24h price is the last meaningful forecast — it is the number a trader would have acted on the day before, before late-breaking information swamps the signal.
- How does Kalshi calibration compare to Polymarket?
- See the by-venue section above. Both venues land within the same Brier band on the resolved sample, but the per-bucket deviations differ — Polymarket has more depth on politics, Kalshi has more brackets in crypto and sports. The breakdowns by venue and category make the differences explicit.
- Is the dataset behind this calibration analysis available?
- Yes. The /api/calibration?source=marketwide endpoint returns the full breakdown by venue, category, and price bucket, plus settlement-time baselines. Period, category filter, and minimum-volume floor are query parameters. Resolution data syncs daily; t-24h prices fill hourly.
- Can prediction markets predict events that have not happened yet?
- Calibration on resolved markets tells you the implied probability is honest on average; it does not promise that any individual prediction is right. Treat the curve on this page as a track record — markets are well-calibrated in aggregate, which makes them a good default prior, but for high-stakes questions always read the underlying market depth, news flow, and volatility (see /markets and /screen).
Raw data
/api/calibration?source=marketwide. Params: period, category, min_volume.
Live markets
Where the field sits today — /markets →
Question-level odds
Cross-venue probability map. /odds →