sf-ml-baseline v0.1
Gradient-boosted tree ensemble for 24h prediction-market forecasting — first feature-based OSS baseline
Install
pip install lightgbm xgboost catboostThe first openly-available calibrated baseline for prediction-market forecasting. All prior work (Halawi 2024, Schoenegger 2024, AIA Forecaster 2025) uses LLM + news retrieval; none ingest engineered microstructure features like implicit yield or calibration ratio. This repo is meant to be the feature-based reference that LLM systems should ensemble with.
Headline results
| Task | Model | Brier | vs baseline |
|---|---|---|---|
| Direction 24h (V1 × T1) | 9-model ensemble | 0.2294 | −0.0206 vs coinflip 0.2500 |
| Resolution 24h (V2 × T4) | XGBoost 3-seed | 0.1681 | −0.0086 vs market-price/100 |
Non-overlapping 95% CI on V1 × T1 across 246,862 test samples.
Features
Five indicators from SimpleFunctions' market_indicator_history table:
price_cents— current market price (0-100 cents)delta_cents— 24h price change (signed cents)iy— implicit yield (% annualized)cri— calibration ratio index (unitless)cvr— calibration variability ratio (note: 0% feature importance for direction — under investigation)
Rolling statistics over 3/12/48-row windows extend this to 35 features for the V2 resolution model.
Known limits
- Trained on 11 days (2026-04-08 → 2026-04-19). v0.2 scheduled for ~2026-05-20 once R2 dump accumulates 30d of history.
- 5-base-feature corpus is saturated for classical GBMs — Phase B bake-off showed LGBM, XGBoost, CatBoost converge to identical Brier. Architecture gains (tabular DL, time-series FMs) are the next frontier.
- Strongest signal on Crypto (Δ=−0.041) and Commodities (Δ=−0.036) per-category evaluations. Sports and Financials are harder.
License
CC-BY-4.0 with a SimpleFunctions Attribution Addendum. Commercial use is welcome provided:
(a) you credit "Powered by sf-ml-baseline (SimpleFunctions, simplefunctions.dev)",
(b) derivative weights use the sf-ml-baseline-* name prefix, and
(c) you apply CC-BY-4.0 to derivatives.
Not financial advice.
Install
pip install lightgbm xgboost catboost
# then clone or download weights from the HF repo
Quickstart
from sf_ml_baseline import SFBaseline
model = SFBaseline()
p_up = model.predict_direction(price_cents=55, delta_cents=3, iy=12.5, cri=0.6, cvr=0.8)
# → 0.437 (probability that the market's price will be higher in 24h)Tags
Related
- PRIMITIVE·OtherPrediction Market Edge DetectorDetect mispricings in prediction markets — filter, rank, monitor
- DATASET·CC-BY-4.0Settled Markets (Hugging Face)Monthly partitions of every settled Kalshi + Polymarket market with outcome + predicted price
- PRIMITIVE·OtherPrediction Market Regime ClassifierDetect market regime states (crisis, risk-off, risk-on, complacent) in real time