← Back to Blog
techApr 16, 20268 min read

We Let an LLM Agent Autonomously Manage Kalshi Positions — Here's Its Architecture

Portfolio Autopilot runs on a configurable schedule, reads live market data, evaluates positions against operator-defined convictions, and trades within mechanically enforced risk limits. It's been running in production on real capital.

Patrick Liu
#portfolio#automation#prediction-markets#kalshi#llm#risk-management

Patrick Liu


Portfolio Autopilot is a system that lets an LLM agent manage a portfolio of prediction market positions on Kalshi. It runs on a configurable schedule, reads live market data, evaluates open positions against a set of operator-defined convictions, and places or exits trades within mechanically enforced risk limits. It's been running in production on real capital — small scale, deliberately.

This is what we built and how it works.


The Core Loop

Every tick follows the same four phases: collect, evaluate, execute, persist.

Collect. The system assembles a market briefing from multiple data sources: current positions with live P&L, account balance, a macro-level market overview (prediction market indices, traditional market moves, topic-level summaries), and detailed per-ticker analysis for each open position. This includes price regime classification, liquidity indicators, orderbook microstructure, thesis evaluations from our heartbeat engine, and cross-venue price comparisons. All of this is gathered in parallel.

Evaluate. A single LLM call receives the full context — market briefing, open positions, active signals, and crucially, the agent's own notes from previous ticks — and produces a set of decisions: hold, enter, exit, or adjust. The model sees the complete landscape in one pass rather than evaluating positions individually. This is intentional: portfolio-level thinking (concentration risk, correlation, capital allocation) requires seeing everything at once.

Execute. Decisions are validated against a set of hard risk gates before any order is placed. These gates are mechanical — they run outside the LLM and cannot be bypassed by the agent's reasoning. If a decision passes the gates, the order is sent to Kalshi. If it doesn't, the decision is logged but not executed, and the agent is informed why.

Persist. After execution, the system writes a tick record (what was checked, what was decided, what was executed) and the agent produces a handoff note — a structured summary of observations, rationale, and things to watch next time. This note becomes the agent's memory.


Handoff Notes: Memory Without Sessions

LLM agents don't have persistent sessions. Each tick is a fresh invocation — no weights are updated, no hidden state carries over. But portfolio management is inherently temporal: you need to know why you entered a position three days ago, whether the thesis you were watching has developed, whether the pattern you noticed is continuing.

Handoff notes solve this. At the end of every tick, the agent writes a structured note covering: key observations about each position, decisions made and why, things to watch next tick, emerging trends or concerns, and whether the operator's stated views are playing out or being challenged.

The next tick receives the last several handoff notes in chronological order. The agent reads its own prior reasoning before acting. This creates a chain of observations — not a knowledge base, but a sequence of observations the agent can reference and build on.

The mechanism is simple (append a note, read recent notes), but the effect is significant. Without it, the agent would re-discover the same market conditions every tick and potentially reverse decisions it made hours ago for reasons it can no longer recall. With it, the agent develops something that functions like working memory across invocations.


Views vs. Strategies: Separating Conviction from Rules

The system separates two kinds of input that the operator provides:

Views are convictions about the world. They're directional beliefs about markets or themes, ranked by conviction strength. Example: "A specific risk is being systematically underpriced because markets overreact to surface-level signals." A view tells the agent what to believe but not how to trade.

Strategies are mechanical rules and constraints. They define the operating envelope: which categories of markets to consider, what quantitative filters to apply (liquidity thresholds, yield requirements, spread limits), position sizing guidelines, and tactical preferences. A strategy tells the agent how to operate but not what to believe.

Why separate them? Because convictions and rules change at different rates and for different reasons. A view might shift overnight because of a news event. A strategy changes when the operator's risk tolerance or capital allocation changes. Coupling them would mean every conviction update requires re-specifying operational rules, and every rule change requires restating beliefs. Decoupling lets the operator update either independently.

There's a hard constraint in the system: the agent must never take positions that contradict the operator's active views. If the agent's analysis disagrees with a view, it notes the disagreement but does not act against it. The agent is an executor with judgment, not an autonomous decision-maker. The operator's conviction is the authority.


Risk Gates: Mechanical Safety Outside the LLM

The most important architectural decision in Portfolio Autopilot is that risk management does not depend on LLM judgment.

The system enforces a layered set of hard constraints: total portfolio exposure cap, per-market position limit, daily loss ceiling, maximum number of orders per tick, cooldown periods after loss streaks, minimum account balance floor, and single-order size limits. Every order passes through these gates before reaching the exchange. The LLM cannot argue its way past them.

Why not let the LLM manage its own risk? Because LLMs are unreliable at self-constraint under pressure. An LLM that has been losing money and sees what looks like a recovery opportunity will construct a compelling narrative for why this time the position size should be larger. Narrative reasoning is what LLMs are good at. Mechanical discipline is what they're bad at. So we put discipline outside the model.

One asymmetry is worth noting: exits are always allowed. The gates constrain entries aggressively but never prevent the agent from closing a position. This reflects the asymmetric nature of risk. Missing an entry costs potential upside, which the next opportunity can replace. Missing an exit costs realized capital, which is gone. The system is biased toward letting the agent protect capital.


Current State and Limitations

Portfolio Autopilot is running in production on real capital. The scale is small — this is intentional. We built this system to test whether an LLM agent can make coherent, temporally consistent portfolio decisions in a constrained financial environment, not to maximize short-term P&L.

Several characteristics of the current system are worth stating clearly:

The system operates on prediction markets, where outcomes are bounded and event-driven. Approaches that work here may not transfer directly to continuous-price markets.

Handoff notes are lossy. The agent summarizes its reasoning into a fixed-length note. Nuances are lost. Over long horizons, the chain of notes drifts — early observations fade as new ones accumulate. This is a known tradeoff between memory fidelity and context efficiency.

The agent's convictions are derivative. The operator provides the views; the agent executes within them. This is a feature (human authority over directional bets) but also a limitation (the system doesn't generate its own macro views). The agent can surface observations and flag when views appear challenged, but it doesn't autonomously update its belief set.

Performance data is being collected and will be published once the dataset is meaningful. Early observations focus on decision coherence and risk gate behavior, not P&L.


Where This Fits

Portfolio Autopilot is one component of a broader system — SimpleFunctions — that provides infrastructure for autonomous agents operating in prediction markets. The heartbeat engine evaluates theses. The edge discovery system finds mispricings. The indicator layer quantifies market microstructure. Portfolio Autopilot sits on top of all of these, consuming their outputs to make allocation decisions.

The broader question this system touches — what does it look like when an LLM agent has real economic agency, operates on real capital, and must live with the consequences of its decisions over time? — is one we think will matter increasingly as agent systems move from demos to production. We don't claim to have answered it. We're building the answer in production.

For early data on the system's information efficiency, see Compute ROI in Agent Economies.