This is the post nobody writes: what it actually costs and looks like to run a 24/7 prediction market agent in production. Not the theory. Not the architecture diagram. The real numbers, the real bugs, and the real operational lessons.
Architecture Overview
The system has four moving parts:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Vercel Cron │────▶│ Monitor │────▶│ Tavily │
│ (every 15 min) │ │ Service │ │ (news search) │
└─────────────────┘ └───────┬─────────┘ └─────────────────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
┌───────────┐ ┌─────────┐ ┌──────────┐
│ OpenRouter │ │ Kalshi │ │ Supabase │
│ (LLM eval)│ │ (prices)│ │ (state) │
└───────────┘ └─────────┘ └──────────┘
Vercel Cron triggers the heartbeat every 15 minutes. The monitor service fetches prices, searches for news, evaluates the causal tree, checks strategy conditions, and writes results to the database. If a signal fires, it sends a notification.
Real Monthly Costs
Here's what we actually pay running 3 active theses with continuous monitoring:
| Service | What It Does | Monthly Cost |
|---|---|---|
| Vercel Pro | Hosting + cron jobs (96/day) | $20 |
| Supabase Pro | Postgres database + auth | $25 |
| Tavily | News search API (~2,000 searches/mo) | $30 |
| OpenRouter | LLM calls (~500 evaluations/mo) | $15-25 |
| Kalshi API | Market data + orderbook | Free |
| Total | ~$90-100/mo |
For context: a quantitative trading fund pays $50,000-200,000/month for data feeds alone. Bloomberg Terminal is $24,000/year per seat. We're running a causal model + 24/7 monitoring + real-time edge detection for the price of a nice dinner.
Cost Breakdown Details
Tavily ($30/month): Each heartbeat triggers 1-3 Tavily searches per active thesis. With 3 theses and 96 heartbeats per day: ~288-864 searches/day, ~8,640-25,920/month. The $30 plan gives us enough headroom. The searches are targeted at causal tree nodes, not broad market news — this keeps them focused and cheap.
OpenRouter ($15-25/month): Each evaluation sends the causal tree + news context + market data to an LLM for node confidence updates. We use Claude Haiku for routine evaluations (fast, cheap) and Claude Sonnet for significant events (more accurate). Average cost per evaluation: ~3-5¢. With ~500 evaluations/month: $15-25.
Kalshi API (free): Kalshi provides free API access for market data and orderbook. No websocket costs, no data fees. This is a massive advantage over traditional markets where data feeds cost thousands.
Operational Lessons (Things That Broke)
Lesson 1: Tavily Rate Limits Are Real
We hit Tavily rate limits twice in the first month. The heartbeat was running searches too aggressively — every 15 minutes for every node. The fix: batch node searches and cache results for 2 hours unless a significant price movement triggers a re-search.
Now the cadence is:
- Every 15 min: Price check only (free, Kalshi API)
- Every 2 hours: Full news search + LLM evaluation
- On significant price move (>3%): Immediate full evaluation regardless of schedule
This cut our Tavily usage by 60% without meaningfully reducing signal quality.
Lesson 2: Kalshi Price Fields Can Be Null
We discovered this the hard way: the Kalshi API sometimes returns null for the yes_price field on low-liquidity contracts. Our evaluation code assumed a number and crashed.
The fix was simple — default to the last known price and flag the contract as "stale data" — but it took down monitoring for 6 hours before we caught it. Now we have explicit null checks on every price field.
Lesson 3: LLM Hallucination in Soft Conditions
The LLM evaluates "soft conditions" — qualitative assessments like "has there been a significant diplomatic development?" Sometimes it hallucinates. Once it claimed there was a ceasefire announcement when there wasn't — based on a headline about "diplomatic channels remaining open" which it over-interpreted.
The fix: every soft condition evaluation now includes a verification step. The LLM sees the raw news text and must cite the specific sentence that supports its conclusion. If it can't cite, the condition isn't triggered. This reduced false positives from ~8% to <1%.
Lesson 4: Vercel Cron Cold Starts
Vercel serverless functions have cold starts. The first heartbeat after an idle period takes 3-5 seconds to initialize the database connection. This isn't a correctness issue — it's a latency issue. If a major news event breaks and the cron triggers during a cold start, you lose a few seconds.
The practical impact: negligible. Prediction markets move slowly. A 5-second delay on a 15-minute heartbeat cycle doesn't matter. But it's worth knowing.
Lesson 5: Signature Issues with Kalshi API
Kalshi uses HMAC-SHA256 signature authentication. The timestamp must be within 5 seconds of server time. On Vercel serverless functions, the system clock is occasionally a few seconds off after a cold start.
We added NTP clock checking and a retry with fresh timestamp on 401 errors. Happens maybe once a week.
What to Monitor
If you're running this system, here's your dashboard:
1. Tavily Budget Consumption
Track daily search count vs. plan limit. Set an alert at 80% of monthly limit. If you're burning through searches, your evaluation cadence is too aggressive.
2. LLM Evaluation Cost Per Thesis
Track the average cost per evaluation and compare across theses. If one thesis is significantly more expensive, it probably has too many causal nodes or the context window is too large. Consider pruning.
3. Signal-to-Noise Ratio
Track how many heartbeats result in an actual signal vs. "no action needed." A healthy ratio is 1-5% — you want the agent watching quietly 95%+ of the time and only signaling when something actually changes. If you're getting signals on >10% of heartbeats, your conditions are too loose.
4. Price Staleness
Track the age of the latest price for each monitored contract. If a price is more than 30 minutes old, something is wrong with the Kalshi API connection. Alert on this.
5. Causal Tree Stability
Track how much each node's confidence changes per evaluation cycle. Nodes should be relatively stable (±2-3%) with occasional jumps when real news hits. If a node is swinging ±10% on every evaluation, the LLM is being inconsistent — consider pinning that node or using a more deterministic evaluation.
Scaling Considerations
The current architecture handles 3-5 active theses comfortably. At 10+ theses, you'd want to:
- Parallelize evaluations — evaluate theses concurrently instead of sequentially
- Shared news cache — multiple theses often share upstream nodes; cache news searches across theses
- Tiered evaluation — not every thesis needs evaluation every 2 hours; low-conviction theses can run on 6-hour cycles
At 50+ theses, you'd probably want to move off Vercel cron to a dedicated worker process. But honestly, if you have 50 active theses, you have bigger problems than infrastructure.
The Bottom Line
Running a 24/7 prediction market agent costs ~$100/month and takes about 2 hours to set up. The ongoing operational work is minimal — mostly watching for the anomalies described above and occasionally adjusting evaluation cadence.
The value isn't in the infrastructure. It's in the discipline. The agent watches when you sleep. It evaluates without emotion. It signals based on conditions, not feelings. For the price of a streaming subscription, you have a system that a quantitative fund would charge 2-and-20 to provide.
The only expensive component is your thesis. The causal model. The thinking. And that's exactly where your time should go — not staring at price charts at 3am.