Prediction market data,
in bulk.

Name: SimpleFunctions
Author: SimpleFunctions

Five datasets covering live world state, 1.7M+ settled outcomes, monthly Brier calibration scorecards, the four-value SimpleFunctions Index, and the World Awareness benchmark — all CC-BY-4.0, mirrored on HuggingFace, refreshed on a known cadence.

HuggingFace org →Data hub Live API

Python · datasetsCC-BY-4.0

from datasets import load_dataset

ds = load_dataset(
  "SimpleFunctions/settled-markets"
)

# 1.7M outcomes · Kalshi + Polymarket
# Brier-eligible: 326K (t-24h price)
ds["train"].to_pandas().head()

Datasets

1.7M+

Settled outcomes

180+

Days of history

CC-BY

License

Datasets

The five datasets.

World State Daily

Compressed snapshot of the prediction-market world model — top movers, divergences, consensus breaks, SimpleFunctions Index components — in roughly 800 tokens. The same payload that powers /api/agent/world. 180+ days of history.

Format: JSON
Cadence: Daily
Rows: 180+ days
Size: ~150 KB / day

API twin

/api/agent/world

Live world state

Best for

Agent context, daily archive, WAB-style eval traces

Schemadate · regime · sfIndex{disagree,geoRisk,breadth,activity} · movers[] · divergences[] · consensusBreaks[]

huggingface.co/datasets/SimpleFunctions/world-state-daily

Settled Markets

Monthly JSONL partitions of every prediction-market contract that resolved on Kalshi or Polymarket with meaningful volume (>$10K). The current and previous month refresh daily; older months are immutable.

Format: JSONL
Cadence: Daily
Rows: 1.7M+ outcomes (326K Brier-eligible)
Size: ~15-20 MB / active month

API twin

/calibration

Live calibration view

Best for

Brier scoring, event studies, forecasting research

Schema

venue · ticker · title · category · predicted_price · predicted_price_t24h · resolved_outcome · resolved_at · volume

huggingface.co/datasets/SimpleFunctions/settled-markets

Calibration Scorecards

Monthly Brier-score and log-loss calibration breakdowns for Kalshi + Polymarket, published with a 14-day delay after month end to capture late resolutions.

Format: JSON
Cadence: Monthly
Rows: Monthly scorecards
Size: ~2.5 KB / month

API twin

/api/calibration

Latest scorecard API

Best for

Model benchmarking, venue/category calibration checks

Schemaperiod · venue · category · brier · log_loss · hit_rate · price_bucket

huggingface.co/datasets/SimpleFunctions/calibration-scorecards

SimpleFunctions Index History

Flat JSONL time series of the SimpleFunctions Index: disagreement, geo-risk, breadth, and activity. Computed every 15 minutes from the live market universe and exported daily.

Format: JSONL
Cadence: Daily
Rows: 15-min rows
Size: ~1.2 MB

API twin

/api/public/index

Live index API

Best for

Regime detection, macro overlay, feature engineering

Schematimestamp · disagreement · geo_risk · breadth · activity · market_count

huggingface.co/datasets/SimpleFunctions/sf-index-history

World Awareness Bench

Monthly 100-question benchmark designed to measure how well an LLM reflects current-event probabilities. Each question pairs a held-out ground truth with a market-derived reference probability.

Format: JSON
Cadence: Monthly
Rows: 100 questions / month
Size: ~15-30 KB / version

API twin

/papers/world-model

WAB methodology

Best for

LLM eval, world-model probes, reasoning benchmark

Schemaquestion · ground_truth · market_probability · category · resolved_at

huggingface.co/datasets/SimpleFunctions/world-awareness-bench

Quick start

Three ways to load a dataset.

HuggingFace · datasets

Streaming and partition selection without an API key.

pip install datasets
python -c "from datasets import load_dataset; print(load_dataset('SimpleFunctions/settled-markets', data_files='2026-05.jsonl', split='train'))"

Direct JSONL

Pull the monthly partition directly from HuggingFace.

curl -L https://huggingface.co/datasets/SimpleFunctions/settled-markets/resolve/main/2026-05.jsonl | head

Live API twin

Use the API when you want the current value, not a file snapshot.

curl https://simplefunctions.dev/api/agent/world
curl https://simplefunctions.dev/api/public/index

Daily archive

Daily archive for research access.

Beyond the public HuggingFace datasets, a daily archive ships the internal compute tables — orderbook snapshots, indicator history, regime classifications, agent traces. Available via research collaboration.

Research papers →License terms

Archive themes

Markets & orderbook
Indicators & regime
Thesis & agent
Macro & SimpleFunctions Index

FAQ

Frequently asked.

How often are the prediction-market datasets updated?

World State and SimpleFunctions Index export daily. Settled Markets uses monthly JSONL partitions, with the current and previous month refreshed daily. Calibration Scorecards publish monthly with a 14-day delay. World Awareness Bench publishes monthly on the first day of the month.

What license covers the SimpleFunctions datasets?

All public datasets ship under CC-BY-4.0 — free to use commercially or in research with attribution. See /data-license for the full text and citation guidance.

Which file formats are available?

JSONL for Settled Markets and SimpleFunctions Index History. JSON for World State Daily, Calibration Scorecards, and World Awareness Bench. All datasets are mirrored on HuggingFace for direct streaming via datasets, pandas, or polars.

How do I download a single day instead of the whole archive?

For dated JSON snapshots, fetch the file directly from HuggingFace, for example world-state-daily/resolve/main/2026-05-03.json. For live state, hit the matching public API endpoint: /api/agent/world for world state, /api/calibration for the latest scorecard, and /api/public/index for the live index.

Is the schema stable enough to build on?

Schemas are versioned. Additive changes (new optional fields) ship without notice. Breaking changes ship via a new dataset slug — the older slug stays readable for ≥6 months. Subscribe to /changelog for the announcement queue.

Adjacent surfaces

Historical data →

Per-market price history and Brier scorecards.

Real-time feeds →

World state, RSS, MCP — live, no signup.

Indicators →

Twelve quantitative metrics on every market.

Live API →

REST endpoints for the same content.

Calibration →

Brier scores rendered live across categories.

SimpleFunctions Index →

Daily disagreement / geo-risk / breadth time series.

Prediction market data,in bulk.

The five datasets.

World State Daily

Settled Markets

Calibration Scorecards

SimpleFunctions Index History

World Awareness Bench

Three ways to load a dataset.

Daily archive for research access.

Frequently asked.

How often are the prediction-market datasets updated?

What license covers the SimpleFunctions datasets?

Which file formats are available?

How do I download a single day instead of the whole archive?

Is the schema stable enough to build on?

Prediction market data,
in bulk.