DATASET·Hugging Face·CC-BY-4.0
World Awareness Bench (Hugging Face)
Monthly 100-question AI-agent benchmark graded against market consensus
Most LLM benchmarks measure knowledge of the past. World Awareness Bench measures whether an LLM correctly reflects what the present world believes — as priced in by prediction markets right now.
Fresh questions each month. Every question has a market-derived ground-truth probability. Grading is continuous (Brier + log-loss), so partial credit rewards calibrated uncertainty.
Use cases:
- Eval harness for any LLM agent claiming world-awareness
- Fine-tuning target for domain-specific models
- Research on calibration drift between model-released-in-month-N and world-state-in-month-N
Licensed CC-BY-4.0.
Tags
datasethuggingfacebenchmarkevalllm
Related
- DATASET·CC-BY-4.0Settled Markets (Hugging Face)Monthly partitions of every settled Kalshi + Polymarket market with outcome + predicted price
- DATASET·CC-BY-4.0World State Daily (Hugging Face)Daily end-of-day prediction-market world state from Kalshi + Polymarket
- DATASET·CC-BY-4.0Calibration Scorecards (Hugging Face)Monthly Brier + log-loss breakdowns for Kalshi + Polymarket