How Calibrated Are Prediction Markets?
Brier score analysis using prices 24 hours before resolution across settled prediction markets from Polymarket and Kalshi. Lower Brier = better calibration. Perfect calibration = 0, coin flip = 0.25.
Calibration (t-24h price)
Brier score
0.218(mediocre)
Hit rate
69.0%
t-24h sample
5,778
t-24h coverage
9.6%
Settlement-time comparison: Brier 0.141 / 85.7% hit rate / n=60,000. Settlement price reflects post-convergence — it measures whether markets reach the right answer, not whether they predicted it 24h out.
By venue (t-24h)
| Venue | Brier | Hit rate | Sample |
|---|---|---|---|
| kalshi | 0.226(mediocre) | 68.2% | 5,175 |
| polymarket | 0.154(fair) | 75.1% | 603 |
Calibration Curve (t-24h)
If markets are perfectly calibrated, predicted probability should equal actual resolution rate. Deviation = overconfidence (positive) or underconfidence (negative).
| Bucket | n | Predicted | Actual YES% | Deviation | Brier |
|---|---|---|---|---|---|
| 0-10¢ | 1855 | 3.1% | 20.4% | -17.3pp | 0.191 |
| 10-20¢ | 848 | 14.2% | 33.8% | -19.6pp | 0.262 |
| 20-30¢ | 686 | 24.1% | 35.0% | -10.9pp | 0.240 |
| 30-40¢ | 632 | 34.5% | 46.0% | -11.5pp | 0.261 |
| 40-50¢ | 505 | 44.1% | 53.1% | -9.0pp | 0.256 |
| 50-60¢ | 533 | 53.6% | 61.9% | -8.3pp | 0.238 |
| 60-70¢ | 237 | 63.9% | 71.7% | -7.8pp | 0.211 |
| 70-80¢ | 160 | 74.0% | 77.5% | -3.5pp | 0.176 |
| 80-90¢ | 134 | 84.0% | 86.6% | -2.6pp | 0.115 |
| 90-100¢ | 188 | 96.5% | 97.3% | -0.8pp | 0.026 |
By Category (t-24h)
| Category | Brier | Hit rate | Sample |
|---|---|---|---|
| Sports | 0.237 | 66.3% | 4,737 |
| Crypto | 0.097 | 85.0% | 260 |
| Mentions | 0.166 | 74.4% | 258 |
| Climate and Weather | 0.096 | 84.8% | 211 |
| Major Championships | 0.091 | 97.8% | 91 |
| Esports | 0.230 | 68.6% | 51 |
| Golf | 0.135 | 96.3% | 27 |
| Up or Down | 0.250 | 44.0% | 25 |
| Entertainment | 0.167 | 87.5% | 24 |
| Economics | 0.179 | 78.3% | 23 |
| Cycling | 0.226 | 26.3% | 19 |
| Elections | 0.085 | 90.9% | 11 |
| Games | 0.089 | 100.0% | 8 |
| Ethereum | 0.126 | 100.0% | 5 |
| Politics | 0.095 | 80.0% | 5 |
SimpleFunctions thesis-driven
Collection phase. 1396 open predictions currently being tracked.
Calibration metrics will appear once enough thesis edges resolve against market outcomes.
Methodology
- Predicted price: the market's YES-side price 24 hours before resolution, captured from our own price monitoring system (5-minute snapshots via
market_indicator_history). Falls back to Polymarket CLOBprices-historyAPI for older markets. - Settlement price (shown for comparison): last trade price at the time of settlement. This is not a prediction — markets converge to 0 or 100¢ before resolving, so settlement Brier scores are artificially low.
- Venues: Polymarket (via Gamma API) and Kalshi (via Events API). Multi-leg exotic parlays (MVE) are excluded.
- Resolution data: synced daily via
/api/cron/sync-resolutions. t-24h prices filled hourly via/api/cron/fill-resolutions-t24h.
Data quality caveats
- Price monitoring (indicator history) started 2026-04-08. Markets resolved before this date rely on venue API fallbacks for t-24h data. Kalshi has no historical price API, so pre-4/8 Kalshi markets lack t-24h data entirely.
- Mid-bucket sample sizes (30-70¢) are small compared to the extremes (0-10, 90-100). This is inherent to prediction markets — most outcomes are near-certain or near-impossible. Interpret mid-range calibration with caution.
- Kalshi data is dominated by high-frequency crypto and sports markets with many strike-price brackets per event. Each bracket is counted as an independent prediction.
API
Raw data is queryable at /api/calibration?source=marketwide. Params: period (7d/30d/90d/all), category (filter), min_volume (minimum volume threshold).