Feature-Based Prediction-Market Forecasting: Preliminary Observations

A gradient-boosted baseline on 11 days of SimpleFunctions microstructure data

Patrick Liu/SimpleFunctions

We release sf-ml-baseline v0.1, the first publicly documented feature-based forecasting baseline for prediction markets. On 11 days of SimpleFunctions microstructure data (1.76M labelled rows for 24h direction, 14K for resolution) and five engineered features — mid price, 1h delta, implied yield, cancel-replace intensity, and cancel-versus-volume ratio — a three-seed LightGBM ensemble achieves Brier 0.2294 on a held-out 246K-row test set for the 24h direction task, compared to 0.2500 for a coinflip (CI non-overlap, improvement −0.0206). An XGBoost/CatBoost bake-off converges to the same Brier (classical saturation on a five-feature corpus). On the resolution task, per-category LightGBM models beat the price/100 baseline by 0.035-0.041 Brier on Crypto and Commodities. We frame this as a starting point for microstructure-based forecasting rather than a competitor to LLM+RAG systems such as Halawi et al. (2024) or AIA (2025); subsequent versions will be ensembled with those approaches, not substituted for them. Weights, code, and training scripts are released under CC-BY-4.0 with a SimpleFunctions attribution addendum.

prediction marketsLightGBMBrier scorecalibrationmicrostructure

Calibrated World Models for AI Agents

Prediction Markets as Real-Time Context for Language Models

Patrick Liu/SimpleFunctions

Large language models have a knowledge cutoff that prevents them from reasoning accurately about current events. Existing mitigations — web search, news APIs, retrieval-augmented generation — return narrative text that requires parsing and provides no calibrated probabilities. We propose injecting prediction market data as a compact, structured world model into agent system prompts. On the World Awareness Benchmark (WAB), a baseline Claude Haiku 4.5 scores 2.3%, while the same model augmented with an 800-token prediction market world state scores 70.5% — a 31x improvement — with no fine-tuning and no retrieval infrastructure.

prediction marketsAI agentsworld modelcalibrationretrievalPDF