DATASET·Hugging Face·CC-BY-4.0

World Awareness Bench (Hugging Face)

Monthly 100-question AI-agent benchmark graded against market consensus

Most LLM benchmarks measure knowledge of the past. World Awareness Bench measures whether an LLM correctly reflects what the present world believes — as priced in by prediction markets right now.

Fresh questions each month. Every question has a market-derived ground-truth probability. Grading is continuous (Brier + log-loss), so partial credit rewards calibrated uncertainty.

Use cases:

  • Eval harness for any LLM agent claiming world-awareness
  • Fine-tuning target for domain-specific models
  • Research on calibration drift between model-released-in-month-N and world-state-in-month-N

Licensed CC-BY-4.0.

Tags

datasethuggingfacebenchmarkevalllm

Related