Research
Working papers, preliminary notes, and technical reports on prediction-market forecasting, world models for AI agents, microstructure, and calibration.
LaTeX sources and BibTeX on each paper. Code and data at /opensource.
- preliminary note·
Feature-Based Prediction-Market Forecasting: Preliminary Observations
A gradient-boosted baseline on 11 days of SimpleFunctions microstructure data
We release sf-ml-baseline v0.1, the first publicly documented feature-based forecasting baseline for prediction markets. On 11 days of SimpleFunctions microstructure data (1.76M labelled rows for 24h direction, 14K for resolution) and five engineered features — mid price, 1h delta, implied yield, cancel-replace intensity, and cancel-versus-volume ratio — a three-seed LightGBM ensemble achieves Brier 0.2294 on a held-out 246K-row test set for the 24h direction task, compared to 0.2500 for a coinflip (CI non-overlap, improvement −0.0206). An XGBoost/CatBoost bake-off converges to the same Brier (classical saturation on a five-feature corpus). On the resolution task, per-category LightGBM models beat the price/100 baseline by 0.035-0.041 Brier on Crypto and Commodities. We frame this as a starting point for microstructure-based forecasting rather than a competitor to LLM+RAG systems such as Halawi et al. (2024) or AIA (2025); subsequent versions will be ensembled with those approaches, not substituted for them. Weights, code, and training scripts are released under CC-BY-4.0 with a SimpleFunctions attribution addendum.
Patrick Liu· prediction markets· LightGBM· Brier score· calibration - working paper·
Calibrated World Models for AI Agents
Prediction Markets as Real-Time Context for Language Models
Large language models have a knowledge cutoff that prevents them from reasoning accurately about current events. Existing mitigations — web search, news APIs, retrieval-augmented generation — return narrative text that requires parsing and provides no calibrated probabilities. We propose injecting prediction market data as a compact, structured world model into agent system prompts. On the World Awareness Benchmark (WAB), a baseline Claude Haiku 4.5 scores 2.3%, while the same model augmented with an 800-token prediction market world state scores 70.5% — a 31x improvement — with no fine-tuning and no retrieval infrastructure.
Patrick Liu· prediction markets· AI agents· world model· calibration