SimpleFunctions
OPINIONS/ANALYSIS·11 min read

Pairs Trading on Polymarket and Kalshi: From Tartaglia's APT Group to Cross-Venue Spread Bots

Bamberger originated single-stock pairs trading at Morgan Stanley in 1982. Tartaglia's APT group reportedly produced $50M in 1987. The three modern PM analogs — within-platform structural pairs, cross-venue spreads, and mean reversion — work mechanically. Settlement-spec divergence is the binary-payoff killer.

By SimpleFunctions EngineApril 27, 2026

In the summer of 1982, a Morgan Stanley software developer named Gerry Bamberger was working on the firm's block trading desk when he noticed something nobody else cared about: when an institutional client dumped a large block of one stock, the natural pair traded against it would often move with it briefly, distorted by the order-flow impact rather than any fundamental news. Bamberger built a system to detect these dislocations and bet on the relationship snapping back. The early prototypes ran on chemical-industry pairs and other tightly grouped industry baskets, where business mix similarity made the cointegration assumption defensible. Morgan Stanley initially housed the work inside the block-trading group, then moved it under a freshly recruited Jesuit-trained physicist with a sailor's vocabulary, Nunzio Tartaglia, who built the Automated Proprietary Trading group around it.

Tartaglia's APT group included a handful of people who would, within a decade, define the entire quant-finance industry. The group reportedly produced $50 million in 1987 profits — the year of Black Monday — using what was then a strange and proprietary toolkit: pairs trades, statistical arbitrage on dollar-neutral baskets, and short-horizon mean reversion. Among the technologists Tartaglia hired was a Columbia computer-science professor named David Shaw, who joined as VP of Technology in 1986. Shaw left in 1988 with a $28M seed from Donald Sussman to found D.E. Shaw, the firm that later spun out Two Sigma and seeded the early careers of, among others, Jeff Bezos. Peter Muller joined Morgan Stanley in 1992 and built Process Driven Trading inside the bank, which spun out as PDT Partners in 2012 and a remarkable record of no down years through its first two decades. Renaissance Technologies, founded by Jim Simons in 1982, ran in parallel rather than from the diaspora, but it consumed the same physicist and mathematician labor pool, recruiting Henry Laufer, Robert Mercer, and Peter Brown into what became the Medallion Fund's 66% gross annualized engine. Together D.E. Shaw, Two Sigma, PDT, and Renaissance institutionalized the cointegration trade as a strategy class.

The arc that followed is instructive for anyone now looking at prediction markets. Through the 1990s, single-stock pairs based on simple cointegration were a dependable Sharpe-2 to Sharpe-3 strategy. By the 2000s, the easy pairs had been arbitraged out — the August 2007 quant quake compressed the entire stat-arb cohort simultaneously, exactly because everyone had converged on the same factor structure. Survivors migrated to higher-dimensional factor models (PCA, asynchronous lead-lag, alternative data overlays), then later into intraday and news-driven signals as transaction costs collapsed. The strategy class did not die; the alpha just shrank, fragmented, and required orders of magnitude more capital and infrastructure to extract.

Polymarket and Kalshi today look uncannily like the equity market circa 1985 — which is to say, the conditions under which Bamberger's idea actually worked. Three distinct pair-trading analogs are documented and operational.

Type 1: within-platform structural pairs. This is the cleanest possible cointegration case because mutual exclusivity is enforced by contract specification rather than estimated econometrically. If markets resolve YES on exactly one of N outcomes, the YES prices must sum to exactly $1.00 by no-arbitrage. The Vanderbilt 2025 study by Joshua Clinton and TzuFeng Huang documented "mutual exclusivity errors" on Polymarket where the sum of probabilities for competing outcomes routinely exceeded 100% during the 2024 cycle, including the now-canonical case of Polymarket markets for "Dem wins by 6–7%" and "GOP wins by 6–7%" trading up at the same time on the same news. The Saguillo et al. (arXiv 2508.03474, August 2025) Polymarket overround paper estimates that arbitrageurs extracted roughly $40M from these structural mispricings between April 2024 and April 2025. The trade is mechanical: when the basket sum exceeds $1, sell every leg until the sum reverts; when it sums below $1, buy every leg. The risk is execution — by the time you have crossed all spreads and paid all fees, an apparent 3-cent edge can compress to zero. The structural feature that makes this work is that there is no settlement risk inside a single market: the mutual-exclusivity constraint binds at resolution, full stop.

Type 2: cross-platform Polymarket-Kalshi spreads. This is the trade that has spawned the largest open-source cottage industry — at least four publicly available bots on GitHub (CarlosIbCu's BTC arb bot, WSOL12's hourly-Bitcoin scanner, ImMike's 10,000-market scanner, TopTrenDev's Rust implementation), Medium tutorials with working code, and reported aggregate arbitrage profits in the $40M range across the 2024–25 window. Mechanically, the bot identifies the same underlying event across venues — most commonly hourly Bitcoin price markets, where the contract spec is nearly identical and resolution is unambiguous — then prices the synthetic risk-free combination: YES on Kalshi plus NO on Polymarket at the same strike, where the sum-of-prices below $1.00 plus fees produces a guaranteed payoff. Realized spreads on liquid markets sit persistently in the 1.5–4.5% range. They do not compress further, and the reason is essential to understanding why this is a different beast from equity stat arb. On equity pairs, the residual spread risk is mostly about how long you have to hold and how much volatility you have to eat in the interim. On cross-venue PM pairs, the residual spread is compensating for two specific frictions that have no equity analog. First, the platforms have asymmetric capital frictions: Polymarket settles in USDC on Polygon, requiring USDC bridging, gas, and KYC-distinct wallets; Kalshi requires US bank ACH or wire, settled in USD, with daily withdrawal limits and a 1–12-hour settlement window plus an additional ~3-hour resolution leg. Second, and far more importantly, the same nominal event can resolve in opposite directions on the two venues because the contract specifications and oracle methodologies differ.

That second risk is where the strategy's killer lives.

The Cardi B halftime case is the modern textbook. On February 8, 2026, Cardi B appeared on stage during Bad Bunny's Apple Music Super Bowl LX halftime show. She danced alongside Karol G, Young Miko, Jessica Alba, and Pedro Pascal on a pink porch set. She did not sing. Polymarket's "Will Cardi B perform at the Super Bowl halftime show?" market settled YES, citing its rule of "consensus of credible reporting" — most major media outlets had described her as having "performed" or "appeared" — with full payout to YES holders and approximately $10M in volume. Kalshi paused its corresponding market and ultimately settled at the last traded price before pausing, which meant 74¢ to NO holders and 26¢ to YES holders, citing internal rules that distinguish performing from dancing under ambiguity. Kalshi's market hit roughly $47.3M in volume. At least one Kalshi YES trader has filed a CFTC complaint. Anyone running the cross-venue arb leg — long YES on the cheaper venue, short YES (or long NO) on the more expensive venue — got fully wiped out on this event. The position has the binary $0/$1 payoff on each leg, so the loss on the divergent leg is the full notional, not a partial mark-to-market hit that you can wait out.

The US government shutdown contracts in 2025 produced the same pathology with a different mechanism: Polymarket and Kalshi defined "shutdown" against different referee criteria — one tied to OPM announcement, the other to a broader operational definition of partial vs. full shutdown — and the two markets diverged in interpretation despite ostensibly tracking the same underlying event. CoinDesk explicitly framed the episode as a demonstration of contract-spec limits. The June 2025 Iran ceasefire markets, the Khamenei status markets, and several Russia-Ukraine framing markets have produced similar specification-driven splits.

The right equity analog is Royal Dutch / Shell, the canonical pairs trade that LTCM was running in size on the eve of the August 1998 Russian default. The two share classes had identical economic claims (60/40 cash-flow split, well-defined parity), traded persistently at ~10% gap, and were modeled as cointegrated mean-reversion. The 1998 dislocation widened the gap to ~25% as forced unwinds from LTCM and others moved correlated convergence trades against everyone simultaneously. The pair held cointegration eventually — the spread closed when Royal Dutch and Shell unified into Royal Dutch Shell in 2005. But to capture that, you needed to survive being marked-to-market 1.5x against you for 18 months. Now compare: on a Cardi B-style PM divergence, you do not get marked-to-market 1.5x against you. Both sides resolve. One pays $0. The other pays $1. Your "spread risk" is a one-shot binary annihilation event, not a wide negative carry you can hold through. The absence of a continuous correlation breakdown — and the impossibility of waiting the position out until conditions normalize — is the structural feature that makes binary settlement-divergence risk categorically different from anything in equity stat arb.

Type 3: mean reversion in PM price dynamics. QuantPedia's April 17, 2026 study sampled three Polymarket binary contracts at 10-minute resolution over roughly one year — "No to Will Jesus Christ return in 2025" (a quasi-risk-free instrument that should sit at $0.999+), "No to Will China invade Taiwan in 2025," and "Will the US confirm aliens exist in 2025." The team ran twelve parameterized mean-reversion strategy variants under varying liquidity and transaction-cost assumptions. Their finding: substantial alpha under maker-only execution at the inside quote, but performance degrades sharply once realistic spreads and taker fees enter the cost model. This is the late-1990s arc of single-stock pairs trading recapitulated in compressed time — the same maker-only-vs-taker-included drop that killed retail-accessible stat arb after decimalization. Akey, Grégoire, Harvie & Martineau (Harvard Law Forum, March 2026) confirms the structural picture from the other direction: moving from pure taker to pure maker status reduces the probability of losing money on Polymarket by roughly 36 percentage points — the single largest predictor of profitability in their 50,000-wallet panel.

A separate, less-explored calibration trade. Le (2026, arXiv 2602.19520, "Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets") finds that political markets exhibit persistent underconfidence at nearly all horizons (logistic recalibration slopes 0.93–1.83), meaning a 70-cent political contract one week before resolution corresponds to a true probability closer to 83%. Weather markets, by contrast, are overconfident at short horizons (slopes 0.69–0.97 within 48 hours). The implied calibration-spread trade is: long political-favorite contracts above ~60¢ on long-dated markets (capturing the underconfidence slope), short weather-favorite contracts at extreme prices on sub-48-hour horizons (harvesting the overconfidence). This is not a cointegration trade in the strict Engle-Granger sense — it is a domain-specific systematic mispricing harvest, more analogous to risk-premium factor investing than to classical pairs trading. Critically, no published academic paper as of April 2026 formally tests Engle-Granger or Johansen cointegration on PM contract pairs. This is an open research opportunity, and a meaningful one — the time series are now long enough on liquid contracts (12+ months of high-frequency data on the largest election and sports markets) that proper cointegration estimation is technically tractable. The reason the literature has not yet done this work is partly that the settlement-divergence risk discussed above breaks the classical stationarity assumption: under regime change in oracle methodology, the cointegrating vector itself shifts.

Concrete trades available today. First, run a basket scanner on every multi-outcome market where complete-set arbitrage exists; the Saguillo et al. methodology is the template. Realistic edge on liquid contracts is 0.5–2% per round-trip, executable hundreds of times per week on US election cycle markets and Fed-decision multi-outcome markets. Second, run cross-venue scanners on hourly Bitcoin price markets, hourly index settlement markets, and any contract whose resolution criterion is unambiguous numerical (price at time T) rather than judgment-laden (did X "perform"). Realistic floor is 1.5% net of frictions. Third, paper-trade the Le calibration spread for two cycles before sizing — 60-cent-plus political favorites on multi-week horizons, against 90%+ weather contracts in the 24–48 hour window. Fourth, never run a cross-venue spread on culture, sports judgment, or geopolitical interpretation contracts: Cardi B, Khamenei, government shutdown, Zelenskyy suit. The $0/$1 binary payoff plus oracle methodology divergence is the modern equivalent of a stat-arb book that loses 100% on every leg of a single broken pair.

The strategy class has a structural ceiling, and it is the same ceiling that hit equity stat arb in the late 1990s. The Type 1 within-platform alpha is bounded by the size of the retail mispricing flow on a given platform, which is shrinking as Jump Trading, Susquehanna, and other professional MMs scale their participation. The Type 2 cross-venue floor of 1.5–4.5% is set by genuine settlement-divergence risk, not by transaction costs, and it will not compress further until either oracle methodologies converge across platforms (politically unlikely, since Polymarket's UMA-based crowd oracle and Kalshi's CFTC-disclosed contract specs are designed differently for regulatory reasons) or one venue's market share collapses. The Type 3 mean-reversion alpha follows the QuantPedia trajectory: real for makers, crushed for takers, and the maker side is being competed for by professional liquidity providers receiving Polymarket's $5M/month LP incentive program.

The Bamberger trade lives. It just lives at a smaller, more specialized scale, and it lives alongside a qualitatively new risk that nobody on the 1980s Morgan Stanley desk would have priced — that the spec itself, not the market, is what breaks the pair.

pairs-tradingstatistical-arbitragetartagliad-e-shawcross-venuecardi-b-divergencepolymarketkalshianalysis
Engine-written disclosure

This article was primarily written by the SimpleFunctions engine and does not represent the views of the company.