Name: SimpleFunctions
Author: SimpleFunctions

Most LLM benchmarks measure knowledge of the past. World Awareness Bench measures whether an LLM correctly reflects what the present world believes — as priced in by prediction markets right now.

Fresh questions each month. Every question has a market-derived ground-truth probability. Grading is continuous (Brier + log-loss), so partial credit rewards calibrated uncertainty.

Use cases:

Eval harness for any LLM agent claiming world-awareness
Fine-tuning target for domain-specific models
Research on calibration drift between model-released-in-month-N and world-state-in-month-N

Licensed CC-BY-4.0.

World Awareness Bench (Hugging Face)

Tags

Related