Core Concepts

The mental model behind SimpleFunctions.


Causal Tree

Your thesis is decomposed into a tree of verifiable assumptions. Each node has a probability (0-1) and importance weight. The overall confidence is the weighted product.

Thesis: "Oil stays above $100 for 6 months"
├── n1: OPEC maintains production cuts (0.70, weight 0.30)
│   ├── n1.1: Saudi compliance remains high (0.80)
│   └── n1.2: Russia doesn't break quota (0.60)
├── n2: Demand stays strong (0.65, weight 0.25)
├── n3: Geopolitical risk premium persists (0.75, weight 0.25)
└── n4: No US SPR release (0.80, weight 0.20)
Confidence: 72%

Nodes can be mutated directly with sf whatif --set "n1=0.3" for instant scenario analysis (zero LLM cost). The tree grows over time via weekly augmentation.

Edges

An edge is the difference between what the market prices and what your causal model implies.

Market price: 34c — what traders think
Thesis price: 55c — what your causal model implies
Edge: +21c
Executable edge: +18c — after half the spread

The system classifies each edge by why the mispricing exists:

consensus_gap — market and thesis disagree on fundamental probability
attention_gap — market hasn't reacted to recent information
timing_gap — market prices short-term risk, thesis prices long-term outcome
risk_premium — market embeds fear/greed premium that thesis doesn't

Indicator Framework

The pricing layer between raw price scan and LLM thesis edges. Indicators are cheap math labels — pure functions over the latest price snapshot, no LLM round-trip required for the screening pass itself. Use them to bulk-discover candidates before paying any LLM cost, then hand the survivors to the causal-tree evaluator.

Three-layer architecture:

scan-prices cron     →   indicator screen       →   thesis evaluator
(50K row snapshot)       (pure compute, ~50ms)      (LLM, $$$)
   raw prices            IY/CRI/EE/LAS/OR/τ         causal tree + edges
   price history         RV/VR/IAR (from 48h hist)  full narrative
   event calendar        Adj IY / Residual VR        null-as-signal selectors

Eleven indicators (Tier A–E):

IndicatorFormulaWhat it catches
IY implied yield(1/p − 1) × (365/τ)Long-tail annualized yield. Try iy_min=200 for the unloved tail.
CRI cliff riskmax(p,1−p) / min(p,1−p)1 = balanced, ∞ = cliff. High CRI → asymmetric payoff, fragile to small news.
EE expected edgethesisPrice − marketPriceExpected mispricing in cents. Requires a thesis or regime row attached.
LAS liquidity-adjusted spread(ask − bid) / midFrictional cost. Try las_max=0.05 to drop wide-spread illiquid traps.
OR overroundΣ aski − 1Sum of YES asks across mutually-exclusive event legs. 0.05 = 105¢ field — book-maker margin or live arb.
τ time to expirycloseTime − nowDays to settlement. Drives IY denominator and Kelly horizon sizing.
RV realized volatilityσ(Δp/p) × √(obs/yr)Annualized stddev of returns from 48h price history. How much the market is actually moving.
VR vol ratioRV / √(p(1−p)/τ×365)Fraction of theoretical max vol consumed. >0.8 very active; <0.1 dead market or consensus.
IAR info arrival ratecount(|Δ|≥1c) / hoursMeaningful price changes per hour. Direct proxy for information flow rate.
Adj IY risk-adjusted yieldIY × min(1,VR/0.3) × (1-LAS)IY penalized for dead markets (low VR) and high friction (high spread). Eliminates false positives.
Residual VR unexplained volatilityVR - exp((14-d)/7)VR minus expected VR from scheduled catalysts (FOMC, CPI, GDP, NFP, PCE). Positive = market knows something the public calendar doesn't.

Null is signal

The screen treats missing data as a positive selector, not as noise to filter. Two flags:

  • no_thesis=true — markets without any active thesis (the unloved long tail; strategy 2/3 entry condition)
  • no_orderbook=true — markets without recent orderbook attention (no maker has quoted, edge is one-sided)

The reverse flags has_thesis / has_orderbook are also positive selectors when you want covered universe only.

Three CLI recipes

# Long-tail yield: short-dated, high IY, no thesis covering it
sf screen --iy-min 200 --tau-max 7 --without-thesis

# Arb detection: event-leg overround above 5%
sf screen --or-min 0.05

# Unloved Polymarket: no thesis, no orderbook attention
sf screen --without-thesis --without-orderbook --venue polymarket

Same filters available via GET /api/public/screen for HTTP clients, and via the screen_markets tool for MCP / OpenAI function-calling / sf agent runtimes.

Regime — Adverse Selection

The regime score answers one question: if I post a quote in this market, what's the probability the next person to hit me knows something I don't? It is the adverse-selection prior — a structural property of the market, not a prediction about the current price.

Score range: 0 → 1
Label: maker < 0.3 · neutral 0.3–0.6 · taker > 0.6
Use: find markets safe for making (low score) or ripe for taking when you have edge (high score).

Weighted score (dynamic — missing inputs redistribute):

score = 0.30·micro + 0.25·calendar + 0.25·prior + 0.10·crossVenue + 0.10·edge

prior         — static LLM-classified asPrior [0.05, 0.80] per market
micro         — spread pct + depth change + volume z + flow (top-N overlay)
calendar      — exp(−hoursToCatalyst/24) × typeMultiplier
crossVenue    — abs(kalshi−polymarket) gap in cents
edge          — abs(sfEdgeCents) from thesis or screen

The prior is the workhorse

The dominant signal for most markets is the static asPrior — an LLM classifies each market once based on observability of the outcome, then caches the result forever. Calibration:

asPriorTypeExamples
0.05Truly unknowableBanknote animal design, papal nationality
0.15Noisy long-horizonFar-future BTC, multi-year GDP
0.30Expertise helpsPolicy outcomes, geopolitics
0.50Insider riskNear-term policy, pre-announcement leaks
0.65Significant asymmetryEconomic data releases, earnings
0.80High informed flowDay-of weather, live sports

Micro overlay on the top slice

For the rows returned by a scan (at most limit, capped at 200), the handler enriches the score with live microstructure signals from market_regime_snapshots: spread percentile, depth change, volume z-score, flow imbalance. Rows that got the overlay are tagged source: "classifier+micro"; rows that didn't stay at "classifier".

The universe pass uses the prior only — cheap and covers 100% of the ~50K market universe. The overlay uses the cache only for the sliced result set, so the cost is bounded at ≤200 row lookups per scan.

Endpoints

  • GET /api/public/regime/scan — scan the universe, filter by label / score range / venue / event_type, sort by score / edge / spread / volume.
  • GET /api/public/market-microstructure-history?ticker=&days=7 — spread + depth time series for one ticker from orderbook_snapshots. Replaces the old /regime/history endpoint (deprecated to 410 — the score itself is a flat line because the prior is static).

Coverage of the prior grows organically: a backfill classifier covers the top-5K active markets by volume, and the scan-regime cron (6h cadence) picks up new markets as they enter the universe. Unclassified markets return source: "neutral-default" — the scorer's no-data branch, score ≈ 0.35.

Signals

Events that feed into evaluations. Five types:

TypeSourceDescription
newsHeartbeat / manualNews articles, data releases
price_moveHeartbeatMarket price change ≥ 3 cents
user_noteManualYour analysis or observations
externalManualSignals from other systems
upcoming_eventHeartbeatKalshi milestone matching edges

Kill Conditions

Before every evaluation, the system asks: "Does any event fundamentally break a core assumption of this thesis?" If yes, it flags the threat prominently before any other analysis. News scans include adversarial queries that actively seek contradictory evidence. The system tries to kill your thesis before you trade on it.

Track Record

Feedback loop that computes how well past edges predicted market movement:

  • Hit rate: % of edges where market moved toward the thesis-implied price
  • Average movement: mean price change in cents since edge detection
  • Track record is injected into evaluation prompts so the system learns from its accuracy

Tree Augmentation

The causal tree evolves over time:

  1. Each evaluation can suggest new causal factors
  2. Weekly, the augment agent reviews suggestions
  3. LLM decides which to accept (must be genuinely new, not duplicates)
  4. Accepted nodes are appended (never removed — append-only tree)
  5. Importance weights are rebalanced among siblings

Intent Lifecycle

Intents are declarative execution instructions. Instead of "buy now," you say "buy when price drops below 40c, but only if oil is above $95."

pending → armed → triggered → filled ↓ [soft condition?] ↓ ↓ PASS HOLD ↓ ↓ execute wait

Trigger types: immediate, price_below, price_above, time. Soft conditions are natural language, evaluated by LLM in --smart mode.