Prediction market agent — autonomous workflows.

Name: SimpleFunctions
Author: SimpleFunctions

End-to-end agent template: read state → reason → act, with calibrated views as the bridge.

A reference architecture for a self-hostable prediction market agent on Kalshi and Polymarket. Five stages — gather, reason, decide, execute, reconcile — wired around the platform's public surfaces. Bring your own model, bring your own risk policy; the platform supplies the read surfaces, normalized intents, audit log, and the calibration loop that closes Brier feedback into the next cycle. Hub view at/ai-agents; hosted variant at/portfolio-autopilot.

Five stages · four templates · calibration loopBYOM · BYOR · self-host

Patterns·Agentic CLI·Execution·World state

Realist oil painting in Wright of Derby chiaroscuro style — Galileo at the telescope, alone with calibrated instruments

Galileo at the telescope — the first quantitative forecaster, alone with calibrated instruments.

Five-stage reference architecture

Every production prediction market agent on the platform composes from these five stages. Stages 1, 4, and 5 use platform surfaces; stages 2 and 3 are operator-owned (BYOM and BYOR). The shape is intentionally minimal.

Stage

What it does

Surface

01

Gather

Pull current world state, target-market detail, and any new signals (X chatter, news, gov / econ overlay)

/world · /event-probability-api · /realtime-data-api · /query-gov · /query-econ

02

Reason

The agent (Claude, GPT, custom) consumes the gathered context and applies the operator's thesis or model — produces calibrated views

Operator-owned model layer (BYOM)

03

Decide

Convert views into proposed actions — buy / sell / hedge / hold — with size and trigger conditions; runs the operator's risk policy

Operator-owned policy layer (BYOR)

04

Execute

Submit normalized intents through the platform execution layer; idempotency keys + dry-run + audit log

/prediction-market-execution

05

Reconcile

Pull fills + settlements back, mark to current price, update the agent's view of the world for the next cycle

Reconciliation feed (CSV / JSON / Parquet)

Four concrete workflow templates

Production patterns from the field. Each template runs the five-stage loop with different cadence, sizing, and exit conditions. Copy any of them as a starting point.

Event-trigger agent

Watch a basket of contracts and place a sized intent the moment a price condition fires

01Subscribe via WebSocket to ticker:* topics for the watch basket
02Maintain a thin in-memory model of price + spread + recent flow per ticker
03When a ticker crosses an operator-defined trigger (price below X, spread under Y), invoke the LLM with the ticker context to produce a sized view
04Submit a dry-run intent first; if the platform's risk-gate report is clean, submit live with an idempotency key
05Log the LLM rationale + intent record to a local audit file for post-trade review

Drawdown-guard agent

Continuously evaluate portfolio risk; reduce or halt on drawdown thresholds

01Pull positions + balance + recent fills via the agentic CLI on a 5-minute cadence
02Compute peak-to-current drawdown across the full book and per-strategy
03If drawdown exceeds the operator-set threshold, the LLM authors a reduction plan (which positions, in what order)
04Execute the reduction as a sequence of sell intents with conservative limit prices
05Halt new buy intents until drawdown recovers to a separate operator-set threshold

Daily-research agent

Author a daily research note + trade-idea list for the operator to review

01At a fixed local hour, pull the world snapshot + delta-since-yesterday + top movers
02Cross-reference with /query-gov + /query-econ for relevant policy and macro context
03The LLM produces a short markdown note with named theses, suggested trade ideas, and confidence levels
04Optionally render trade ideas as dry-run intent records the operator can flip live
05Email or post the note; the agent does not act until the operator approves

Hedge-finder agent

Map a real-portfolio exposure to a basket of binary contracts with cost / coverage tradeoffs

01Read the portfolio exposure as input (position file, NAV report, or simple JSON)
02Search Kalshi + Polymarket for contracts that map to the named risk dimensions
03The LLM proposes a basket — leg sizes, total cost, and which exposure each leg covers
04Produce a one-page hedge proposal with the mapping table and dry-run intent records
05Operator reviews; on approval, the basket is submitted as a sequence of intents with shared idempotency prefix

Self-host vs hosted Portfolio Autopilot

Same architecture, two operating postures. Self-host when every layer needs to live in the operator's code. Hosted when the agent loop should run as a service. Both share the same intent + risk-gate stack underneath.

Self-host (this template)

Hosted (/portfolio-autopilot)

Where it runs

Operator's machine, container, or cloud — wherever the agent loop fits

SimpleFunctions hosted runtime; LLM calls and orchestration handled by the platform

Model freedom

Bring any model — Claude, GPT, Gemini, open-weight, fine-tunes — operator owns the prompt

Curated model + prompt; updated as part of the platform

Risk policy authoring

Operator authors and version-controls every gate; full visibility

Default risk gates + per-fund overrides via configuration

Ops burden

Operator handles uptime, log rotation, alerting, model billing

Platform handles uptime + observability; operator gets reports

Cost shape

Direct LLM costs + ops time; no platform usage fee for the agent itself (just the API tier)

Subscription + usage; explicit budget via /portfolio-autopilot configuration

Audit

Local audit file + platform-side intent log; both available for compliance review

Platform audit log is the system of record; local export on demand

Best for

Quants and developers who want every layer in their own code

Operators who want the agent loop as a service — see /portfolio-autopilot

Calibration loop — how the agent learns

Agents that act on probability must close the feedback loop on their own calibration. Five-stage chain — view, outcome, score, update, audit — designed so every step is queryable.

01

View

The agent emits a calibrated view (probability + confidence) every time it acts. Views are stored alongside the intent record.

02

Outcome

Every prediction market contract resolves to a binary outcome. The platform records the resolution against each linked view.

03

Brier feedback

A Brier-style score (or its multi-class generalization) is computed for the agent's views over a rolling window — by topic, by horizon, by venue.

04

Update

The next cycle's prompt sees the agent's recent calibration. Persistently overconfident topics get downweighted; persistently well-calibrated topics get more aggressive sizing.

05

Audit

The full chain — view → outcome → Brier — is queryable per agent, per topic, per period. The calibration loop is auditable, not implicit.

Methodology + working notes at /papers.

FAQ

What is a prediction market agent?

An autonomous program that reads the prediction market world (Kalshi + Polymarket), reasons about it with an LLM or other model, and optionally acts through normalized execution intents. The agent runs in a loop — gather → reason → decide → execute → reconcile — usually until an operator-defined exit condition or schedule.

How is this different from /portfolio-autopilot?

This page is the reference architecture for a self-hostable agent — the template you copy and adapt. /portfolio-autopilot is the hosted variant: the same architecture run as a service, with curated model and prompts, default risk gates, and platform-side audit. Same intent + risk-gate stack underneath; different operating posture.

Can I bring my own model?

Yes. BYOM is first-class. The reference architecture treats the model layer as opaque — the agent calls a function that takes context and returns a view. Implementations exist for Claude, GPT, Gemini, open-weight (Llama, Qwen, Mistral), fine-tunes, and ensembles. The operator owns the prompt, the temperature, and the routing.

How does the agent learn?

Through the calibration loop. Every emitted view is paired with the eventual binary outcome; a Brier-style feedback score is computed over rolling windows by topic, horizon, and venue. The next cycle's prompt receives the agent's recent calibration as context, so persistently overconfident areas get downweighted. Learning is auditable — the chain from view to outcome to score is queryable.

Where does the agent run?

Anywhere. Local machine, container, cron job, Lambda, Cloudflare Worker, Trigger.dev task, GitHub Action — the reference architecture is transport-agnostic. The platform is reachable via HTTPS REST + WebSocket; the agentic CLI runs anywhere Node 18+ runs. Self-host wherever the operator's ops posture lives.

Self-host or hosted?

Self-host when the operator wants every layer in their own code — model choice, prompt versioning, risk policy authoring, log retention. Hosted (/portfolio-autopilot) when the operator wants the agent loop as a service and is happy to configure rather than author. Many desks start self-hosted to learn the surface, then migrate hosted strategies they want supervised.

What tools does the agent use?

The same tools published on /ai-agents and exposed via /api/tools — world snapshot, probability, search, screen, indicators, gov + econ overlay, intent submit, intent watch, reconciliation. The exact catalog is published live; this page deliberately does not hardcode a count.

How is the agent evaluated?

Three layers. (1) Calibration: rolling Brier-style score on emitted views, by topic and horizon. (2) P&L: standard portfolio-level — realized + unrealized, against operator-set benchmarks. (3) Behavior: did the agent respect risk gates, did it dry-run before live, did it use idempotency keys. The /calibration surface and the agent's local audit log together cover all three.

How is the agent audited?

Two-sided audit. Locally, the agent writes a structured log of every prompt, response, and resulting intent. Server-side, the platform writes an immutable record of every intent, risk-gate evaluation, and venue submission. Both sides can be exported for compliance review; the platform side is the system of record for capital movement.

How does safety work?

Risk gates run before every intent — size cap, exposure cap, drawdown ceiling, regime gate, daily-loss cap, dry-run toggle. Operators author and version-control these in code; the platform evaluates them at execution time and rejects intents that fail. Idempotency keys make replays safe. Dry-run validates the full pipeline without capital at risk.

What if the LLM hallucinates a market?

The agent should not act on a market it has not verified against /api/public/scan or /api/public/market/:ticker. The reference architecture treats the model output as proposing a market id, then the agent looks it up and rejects the action if the id does not resolve. The intent submit endpoint also rejects unknown tickers — defense in depth.

Related surfaces

AI agents on prediction markets

Hub — four product surfaces and the host matrix. Read this to orient.

Agentic usage

Patterns + worked examples for AI agents on the four surfaces.

Portfolio Autopilot

Hosted variant — the same architecture run as a service.

Prediction market execution

Intents, triggers, routing, monitoring — the execution surface the agent submits to.

Agentic CLI

sf binary as an agent control plane — JSON output, idempotency, dry-run safety.

World state

Calibrated 15-minute snapshot — the gather-stage source of truth.

Papers

Methodology behind cross-venue normalization, world-model, and calibration.