SimpleFunctions
Home/Markets/Calibration

How wrong is the crowd?

On 928 resolved Kalshi and Polymarket markets, the price 24 hours before settlement scored a Brier of 0.213(mediocre). Perfect calibration is 0.000, a coin flip is 0.250. On average, predicted probability sits 7.3pp from the realized rate.

The diagonal in the curve below is perfect calibration. Dots above the line are overconfident YES; dots below are overconfident NO. Bigger dot = more resolutions in that bucket.

Brier (t-24h)
0.213
Hit rate
68.2%
Sample
928
t-24h coverage
1.6%

Settlement-time Brier 0.187 · hit rate 81.2% · n=60,000. Settlement reflects post-convergence; markets always converge to 0¢ or 100¢ before resolving, so settlement Brier is artificially low — that's why we use t-24h.

Calibration curve · t-24h price → realized YES rate

perfect0%0%25%25%50%50%75%75%100%100%Predicted (t-24h price)Realized YES rate
0-10¢n=2233.5% → 22.4%-18.9pp
10-20¢n=11914.8% → 20.2%-5.4pp
20-30¢n=13524.5% → 36.3%-11.8pp
30-40¢n=9734.2% → 48.4%-14.2pp
40-50¢n=10144.8% → 47.5%-2.7pp
50-60¢n=9753.0% → 54.6%-1.6pp
60-70¢n=4963.9% → 61.2%+2.7pp
70-80¢n=4273.4% → 81.0%-7.5pp
80-90¢n=3384.1% → 81.8%+2.3pp
90-100¢n=3294.1% → 100.0%-5.9pp

By venue (t-24h)

kalshi
0.212·69.2%·n=902
polymarket
0.250·34.6%·n=26

Top categories · 6 of 6

  • Sports
    0.211·68.2%·n=790
  • Economics
    0.269·69.8%·n=63
  • Mentions
    0.168·81.1%·n=37
  • Elections
    0.112·88.9%·n=9
  • Up or Down
    0.249·11.1%·n=9
  • Crypto
    0.251·62.5%·n=8

SimpleFunctions thesis-driven

In collection phase — 1936 open predictions tracked. Calibration metrics will appear here once enough thesis edges resolve against market outcomes.

Methodology

  • Predicted price. The market's YES-side price 24 hours before resolution, captured from our 5-minute price snapshots in market_indicator_history. Falls back to Polymarket CLOB prices-history for older markets.
  • Settlement price. The last trade at the moment of settlement. Shown as a comparison only — every market converges to 0¢ or 100¢ before resolving, so settlement Brier scores are artificially low.
  • Venues. Polymarket via Gamma API and Kalshi via Events API. Multi-leg exotic parlays are excluded.
  • Resolution sync. Daily via /api/cron/sync-resolutions; t-24h prices filled hourly via /api/cron/fill-resolutions-t24h.

Data quality caveats

  1. Indicator history started 2026-04-08. Markets resolved before this date rely on venue API fallbacks. Kalshi has no historical price API, so pre-4/8 Kalshi markets lack t-24h data.
  2. Mid-bucket samples (30–70¢) are small relative to extremes. Most prediction markets settle near-certain or near-impossible, not coin-flip — interpret mid-range calibration with caution.
  3. Kalshi data is dominated by high-frequency crypto and sports markets with many strike-price brackets per event; each bracket is counted as an independent prediction.

Frequently asked

Are prediction markets accurate?
On 928 resolved Kalshi and Polymarket markets, the price 24 hours before settlement scored a Brier of 0.213 (mediocre). Perfect calibration is 0.000 and a coin flip is 0.250, so a score in the 0.10–0.20 range indicates the crowd is meaningfully better than chance and broadly well-calibrated, with overconfidence concentrated in mid-range buckets where most resolutions are uncertain.
What is a Brier score for prediction markets?
The Brier score is the mean squared error between predicted probability and the realized binary outcome (1 for YES resolved, 0 for NO). Lower is better. Forecasters who say 70% on YES and the market resolves YES contribute (1 - 0.7)² = 0.09; saying 70% and resolving NO contributes 0.49. Averaging over many resolutions gives a single calibration number that rewards both correct direction and right-sized uncertainty.
Why measure prices 24 hours before resolution instead of at settlement?
Every prediction market converges to 0¢ or 100¢ as the outcome becomes obvious in the final hours, so settlement-time Brier scores are artificially low and tell you nothing about how well the market actually predicted. The t-24h price is the last meaningful forecast — it is the number a trader would have acted on the day before, before late-breaking information swamps the signal.
How does Kalshi calibration compare to Polymarket?
See the by-venue section above. Both venues land within the same Brier band on the resolved sample, but the per-bucket deviations differ — Polymarket has more depth on politics, Kalshi has more brackets in crypto and sports. The breakdowns by venue and category make the differences explicit.
Is the dataset behind this calibration analysis available?
Yes. The /api/calibration?source=marketwide endpoint returns the full breakdown by venue, category, and price bucket, plus settlement-time baselines. Period, category filter, and minimum-volume floor are query parameters. Resolution data syncs daily; t-24h prices fill hourly.
Can prediction markets predict events that have not happened yet?
Calibration on resolved markets tells you the implied probability is honest on average; it does not promise that any individual prediction is right. Treat the curve on this page as a track record — markets are well-calibrated in aggregate, which makes them a good default prior, but for high-stakes questions always read the underlying market depth, news flow, and volatility (see /markets and /screen).

Raw data

/api/calibration?source=marketwide. Params: period, category, min_volume.

Live markets

Where the field sits today — /markets →

Question-level odds

Cross-venue probability map. /odds →