PRIMITIVE·Website·Other

sf-ml-baseline v0.1

Gradient-boosted tree ensemble for 24h prediction-market forecasting — first feature-based OSS baseline

Install

pip install lightgbm xgboost catboost

The first openly-available calibrated baseline for prediction-market forecasting. All prior work (Halawi 2024, Schoenegger 2024, AIA Forecaster 2025) uses LLM + news retrieval; none ingest engineered microstructure features like implicit yield or calibration ratio. This repo is meant to be the feature-based reference that LLM systems should ensemble with.

Headline results

TaskModelBriervs baseline
Direction 24h (V1 × T1)9-model ensemble0.2294−0.0206 vs coinflip 0.2500
Resolution 24h (V2 × T4)XGBoost 3-seed0.1681−0.0086 vs market-price/100

Non-overlapping 95% CI on V1 × T1 across 246,862 test samples.

Features

Five indicators from SimpleFunctions' market_indicator_history table:

  • price_cents — current market price (0-100 cents)
  • delta_cents — 24h price change (signed cents)
  • iy — implicit yield (% annualized)
  • cri — calibration ratio index (unitless)
  • cvr — calibration variability ratio (note: 0% feature importance for direction — under investigation)

Rolling statistics over 3/12/48-row windows extend this to 35 features for the V2 resolution model.

Known limits

  • Trained on 11 days (2026-04-08 → 2026-04-19). v0.2 scheduled for ~2026-05-20 once R2 dump accumulates 30d of history.
  • 5-base-feature corpus is saturated for classical GBMs — Phase B bake-off showed LGBM, XGBoost, CatBoost converge to identical Brier. Architecture gains (tabular DL, time-series FMs) are the next frontier.
  • Strongest signal on Crypto (Δ=−0.041) and Commodities (Δ=−0.036) per-category evaluations. Sports and Financials are harder.

License

CC-BY-4.0 with a SimpleFunctions Attribution Addendum. Commercial use is welcome provided: (a) you credit "Powered by sf-ml-baseline (SimpleFunctions, simplefunctions.dev)", (b) derivative weights use the sf-ml-baseline-* name prefix, and (c) you apply CC-BY-4.0 to derivatives.

Not financial advice.

Install

pip install lightgbm xgboost catboost
# then clone or download weights from the HF repo

Quickstart

from sf_ml_baseline import SFBaseline
model = SFBaseline()
p_up = model.predict_direction(price_cents=55, delta_cents=3, iy=12.5, cri=0.6, cvr=0.8)
# → 0.437 (probability that the market's price will be higher in 24h)

Tags

primitivepythonmachine-learninglightgbmxgboostcatboostprediction-marketbrierbaseline

Related