SimpleFunctions

← DataDownloads

Prediction market data,
in bulk.

Five datasets covering live world state, 1.7M+ settled outcomes, monthly Brier calibration scorecards, the four-value SimpleFunctions Index, and the World Awareness benchmark — all CC-BY-4.0, mirrored on HuggingFace, refreshed on a known cadence.

5

Datasets

1.7M+

Settled outcomes

180+

Days of history

CC-BY

License

Datasets

The five datasets.

01

World State Daily

Compressed snapshot of the prediction-market world model — top movers, divergences, consensus breaks, SimpleFunctions Index components — in roughly 800 tokens. The same payload that powers /api/agent/world. 180+ days of history.

Format
JSON
Cadence
Daily
Rows
180+ days
Size
~150 KB / day

API twin

/api/agent/world

Live world state

Best for

Agent context, daily archive, WAB-style eval traces

Schemadate · regime · sfIndex{disagree,geoRisk,breadth,activity} · movers[] · divergences[] · consensusBreaks[]
huggingface.co/datasets/SimpleFunctions/world-state-daily
02

Settled Markets

Monthly JSONL partitions of every prediction-market contract that resolved on Kalshi or Polymarket with meaningful volume (>$10K). The current and previous month refresh daily; older months are immutable.

Format
JSONL
Cadence
Daily
Rows
1.7M+ outcomes (326K Brier-eligible)
Size
~15-20 MB / active month

API twin

/calibration

Live calibration view

Best for

Brier scoring, event studies, forecasting research

Schemavenue · ticker · title · category · predicted_price · predicted_price_t24h · resolved_outcome · resolved_at · volume
huggingface.co/datasets/SimpleFunctions/settled-markets
03

Calibration Scorecards

Monthly Brier-score and log-loss calibration breakdowns for Kalshi + Polymarket, published with a 14-day delay after month end to capture late resolutions.

Format
JSON
Cadence
Monthly
Rows
Monthly scorecards
Size
~2.5 KB / month

API twin

/api/calibration

Latest scorecard API

Best for

Model benchmarking, venue/category calibration checks

Schemaperiod · venue · category · brier · log_loss · hit_rate · price_bucket
huggingface.co/datasets/SimpleFunctions/calibration-scorecards
04

SimpleFunctions Index History

Flat JSONL time series of the SimpleFunctions Index: disagreement, geo-risk, breadth, and activity. Computed every 15 minutes from the live market universe and exported daily.

Format
JSONL
Cadence
Daily
Rows
15-min rows
Size
~1.2 MB

API twin

/api/public/index

Live index API

Best for

Regime detection, macro overlay, feature engineering

Schematimestamp · disagreement · geo_risk · breadth · activity · market_count
huggingface.co/datasets/SimpleFunctions/sf-index-history
05

World Awareness Bench

Monthly 100-question benchmark designed to measure how well an LLM reflects current-event probabilities. Each question pairs a held-out ground truth with a market-derived reference probability.

Format
JSON
Cadence
Monthly
Rows
100 questions / month
Size
~15-30 KB / version

API twin

/papers/world-model

WAB methodology

Best for

LLM eval, world-model probes, reasoning benchmark

Schemaquestion · ground_truth · market_probability · category · resolved_at
huggingface.co/datasets/SimpleFunctions/world-awareness-bench

Quick start

Three ways to load a dataset.

01

HuggingFace · datasets

Streaming and partition selection without an API key.

pip install datasets
python -c "from datasets import load_dataset; print(load_dataset('SimpleFunctions/settled-markets', data_files='2026-05.jsonl', split='train'))"
02

Direct JSONL

Pull the monthly partition directly from HuggingFace.

curl -L https://huggingface.co/datasets/SimpleFunctions/settled-markets/resolve/main/2026-05.jsonl | head
03

Live API twin

Use the API when you want the current value, not a file snapshot.

curl https://simplefunctions.dev/api/agent/world
curl https://simplefunctions.dev/api/public/index

Daily archive

Daily archive for research access.

Beyond the public HuggingFace datasets, a daily archive ships the internal compute tables — orderbook snapshots, indicator history, regime classifications, agent traces. Available via research collaboration.

Archive themes

  • Markets & orderbook
  • Indicators & regime
  • Thesis & agent
  • Macro & SimpleFunctions Index

FAQ

Frequently asked.

01

How often are the prediction-market datasets updated?

World State and SimpleFunctions Index export daily. Settled Markets uses monthly JSONL partitions, with the current and previous month refreshed daily. Calibration Scorecards publish monthly with a 14-day delay. World Awareness Bench publishes monthly on the first day of the month.

02

What license covers the SimpleFunctions datasets?

All public datasets ship under CC-BY-4.0 — free to use commercially or in research with attribution. See /data-license for the full text and citation guidance.

03

Which file formats are available?

JSONL for Settled Markets and SimpleFunctions Index History. JSON for World State Daily, Calibration Scorecards, and World Awareness Bench. All datasets are mirrored on HuggingFace for direct streaming via datasets, pandas, or polars.

04

How do I download a single day instead of the whole archive?

For dated JSON snapshots, fetch the file directly from HuggingFace, for example world-state-daily/resolve/main/2026-05-03.json. For live state, hit the matching public API endpoint: /api/agent/world for world state, /api/calibration for the latest scorecard, and /api/public/index for the live index.

05

Is the schema stable enough to build on?

Schemas are versioned. Additive changes (new optional fields) ship without notice. Breaking changes ship via a new dataset slug — the older slug stays readable for ≥6 months. Subscribe to /changelog for the announcement queue.

Adjacent surfaces