Home/Markets/Calibration

How Calibrated Are Prediction Markets?

Brier score analysis using prices 24 hours before resolution across settled prediction markets from Polymarket and Kalshi. Lower Brier = better calibration. Perfect calibration = 0, coin flip = 0.25.

Calibration (t-24h price)

Brier score
0.218(mediocre)
Hit rate
69.0%
t-24h sample
5,778
t-24h coverage
9.6%
Settlement-time comparison: Brier 0.141 / 85.7% hit rate / n=60,000. Settlement price reflects post-convergence — it measures whether markets reach the right answer, not whether they predicted it 24h out.

By venue (t-24h)

VenueBrierHit rateSample
kalshi0.226(mediocre)68.2%5,175
polymarket0.154(fair)75.1%603

Calibration Curve (t-24h)

If markets are perfectly calibrated, predicted probability should equal actual resolution rate. Deviation = overconfidence (positive) or underconfidence (negative).

BucketnPredictedActual YES%DeviationBrier
0-10¢18553.1%20.4%-17.3pp0.191
10-20¢84814.2%33.8%-19.6pp0.262
20-30¢68624.1%35.0%-10.9pp0.240
30-40¢63234.5%46.0%-11.5pp0.261
40-50¢50544.1%53.1%-9.0pp0.256
50-60¢53353.6%61.9%-8.3pp0.238
60-70¢23763.9%71.7%-7.8pp0.211
70-80¢16074.0%77.5%-3.5pp0.176
80-90¢13484.0%86.6%-2.6pp0.115
90-100¢18896.5%97.3%-0.8pp0.026

By Category (t-24h)

CategoryBrierHit rateSample
Sports0.23766.3%4,737
Crypto0.09785.0%260
Mentions0.16674.4%258
Climate and Weather0.09684.8%211
Major Championships0.09197.8%91
Esports0.23068.6%51
Golf0.13596.3%27
Up or Down0.25044.0%25
Entertainment0.16787.5%24
Economics0.17978.3%23
Cycling0.22626.3%19
Elections0.08590.9%11
Games0.089100.0%8
Ethereum0.126100.0%5
Politics0.09580.0%5

SimpleFunctions thesis-driven

Collection phase. 1396 open predictions currently being tracked.

Calibration metrics will appear once enough thesis edges resolve against market outcomes.

Methodology

  • Predicted price: the market's YES-side price 24 hours before resolution, captured from our own price monitoring system (5-minute snapshots via market_indicator_history). Falls back to Polymarket CLOB prices-history API for older markets.
  • Settlement price (shown for comparison): last trade price at the time of settlement. This is not a prediction — markets converge to 0 or 100¢ before resolving, so settlement Brier scores are artificially low.
  • Venues: Polymarket (via Gamma API) and Kalshi (via Events API). Multi-leg exotic parlays (MVE) are excluded.
  • Resolution data: synced daily via /api/cron/sync-resolutions. t-24h prices filled hourly via /api/cron/fill-resolutions-t24h.

Data quality caveats

  1. Price monitoring (indicator history) started 2026-04-08. Markets resolved before this date rely on venue API fallbacks for t-24h data. Kalshi has no historical price API, so pre-4/8 Kalshi markets lack t-24h data entirely.
  2. Mid-bucket sample sizes (30-70¢) are small compared to the extremes (0-10, 90-100). This is inherent to prediction markets — most outcomes are near-certain or near-impossible. Interpret mid-range calibration with caution.
  3. Kalshi data is dominated by high-frequency crypto and sports markets with many strike-price brackets per event. Each bracket is counted as an independent prediction.

API

Raw data is queryable at /api/calibration?source=marketwide. Params: period (7d/30d/90d/all), category (filter), min_volume (minimum volume threshold).