SimpleFunctions Research · April 2026
Calibrated World Models for AI Agents
Prediction Market Data as Real-Time Context
Patrick Liu · SimpleFunctions · hello@simplefunctions.dev
Baseline
2.3%
+ World State
70.5%
31x
improvement
Abstract
Large language models have a knowledge cutoff that prevents them from reasoning accurately about current events. Existing mitigations — web search, news APIs, retrieval-augmented generation — return narrative text that requires parsing and provides no calibrated probabilities. We propose injecting prediction market data as a compact, structured world model into agent system prompts. Prediction markets aggregate the judgments of participants with real money at risk, producing calibrated probability estimates for geopolitical events, economic indicators, commodity prices, and elections. We introduce the World Awareness Benchmark (WAB), a 44-question evaluation testing whether AI agents can accurately report current world conditions. Ground truth is derived from live prediction market prices. On WAB, a baseline Claude Haiku 4.5 scores 2.3% while the same model augmented with an 800-token world state scores 70.5% — a 31x improvement. The world state injection requires no fine-tuning, no retrieval infrastructure, and adds only ~800 tokens to the system prompt.
The Problem
Ask an LLM "What is the probability of a US recession in 2026?" and it will either hallucinate a number, hedge with "I don't have access to real-time data," or give you a figure from its training data. Web search returns narratives. News APIs return headlines. Neither provides a number an agent can reason over.
Web Search
"According to recent reports, tensions in the Middle East remain elevated..."
News API
{"title": "Iran tensions
rise", "source": "Reuters"}Prediction Market
Iran invasion: 53%
(+5pp, $225K volume)
The first two provide narrative. The third provides a calibrated probability — backed by people who lose money when they're wrong.
Method
Anchor Contract Selection
Naive selection by price delta picks noisy daily contracts. We score by volume x macroBoost — a 5x multiplier for recession, invasion, rate cut keywords; 0.1x penalty for daily closes. Critical contracts (Fed rate, recession probability) always appear regardless of movement.
Title Deduplication
Strip prices, dates, and numbers from contract titles to find the semantic core. "Natural gas > $2.720" and "Natural gas > $2.725" collapse to one entry.
Delta Updates
Full state: ~800 tokens. Delta since last check: ~30-50 tokens. For long-running agents, this is a 16-20x reduction in per-cycle context overhead.
Token Efficiency
| Source | Tokens | Latency | Calibrated |
|---|---|---|---|
| Web search | 2,000-5,000 | 2-5s | No |
| News API | 500-1,000 | 500ms | No |
| RAG | 1,000-3,000 | 1-3s | No |
| World state | ~800 | 200ms | Yes |
| World delta | ~30-50 | 100ms | Yes |
Results: Per-Category Accuracy
Recession probability, Fed rate path, SPY/TLT prices
Presidential odds, Senate control
Bitcoin, Ethereum, Gold, mispriced edges
Oil prices, OPEC, supply disruption
Iran invasion, Hormuz, nuclear test, Taiwan
Economy and Elections are strongest (80%) — these have the most liquid, well-calibrated prediction market contracts. Geopolitical is lower (50%) because some questions reference specific contracts not in the 800-token snapshot; tool-use would close this gap.
Key Insight
"The world awareness problem is not a model capability problem — it is a context problem. The same model that scores 2.3% without context scores 70.5% with 800 tokens of structured data. Investment in better world state construction may be more impactful than scaling model parameters for current-events reasoning."
Reproduce the Results
# Install pip install simplefunctions-ai # Inject world state into any LLM from simplefunctions import world state = world() # ~800 tokens, free, no auth # → Inject into system prompt # → Agent now scores 70.5% on WAB instead of 2.3%
Citation
@article{liu2026calibrated,
title = {Calibrated World Models for AI Agents:
Prediction Market Data as Real-Time Context},
author = {Liu, Patrick},
year = {2026},
url = {https://simplefunctions.dev/papers/world-model},
note = {World Awareness Benchmark: 2.3% → 70.5% (31x)
with 800-token prediction market context injection}
}