Feature-Based Prediction-Market Forecasting: Preliminary Observations
A gradient-boosted baseline on 11 days of SimpleFunctions microstructure data
Patrick Liu/SimpleFunctions
We release sf-ml-baseline v0.1, the first publicly documented feature-based forecasting baseline for prediction markets. On 11 days of SimpleFunctions microstructure data (1.76M labelled rows for 24h direction, 14K for resolution) and five engineered features — mid price, 1h delta, implied yield, cancel-replace intensity, and cancel-versus-volume ratio — a three-seed LightGBM ensemble achieves Brier 0.2294 on a held-out 246K-row test set for the 24h direction task, compared to 0.2500 for a coinflip (CI non-overlap, improvement −0.0206). An XGBoost/CatBoost bake-off converges to the same Brier (classical saturation on a five-feature corpus). On the resolution task, per-category LightGBM models beat the price/100 baseline by 0.035-0.041 Brier on Crypto and Commodities. We frame this as a starting point for microstructure-based forecasting rather than a competitor to LLM+RAG systems such as Halawi et al. (2024) or AIA (2025); subsequent versions will be ensembled with those approaches, not substituted for them. Weights, code, and training scripts are released under CC-BY-4.0 with a SimpleFunctions attribution addendum.