Automated Weather Prediction Markets
I identified systematic mispricings in weather prediction markets and built a fully automated system to exploit them — from data ingestion to live execution on Kalshi.
Context
Weather prediction markets on Kalshi let you trade on whether temperature in a city will exceed a threshold. Most participants use gut feel or basic weather apps. Public ensemble forecast data from NWS and ECMWF contains systematic calibration errors — creating exploitable mispricings for anyone willing to do the quantitative work.
System Architecture
Data Ingestion
NWS & ECMWF ensemble APIs
Forecast Calibration
Seasonal bias correction + isotonic regression
Edge Detection
Model probability vs. market price
Position Sizing
Fractional Kelly criterion with caps
Execution
Automated orders via Kalshi API
Risk Monitoring
3σ filters & drawdown limits
DATA → CALIBRATE → DETECT → SIZE → EXECUTE → MONITOR
Key Decisions
The technical choices that shaped the system — and why I made them.
Why isotonic regression for calibration?
Raw weather forecasts have non-linear calibration errors — they're overconfident in some temperature ranges and underconfident in others. I chose isotonic regression over logistic recalibration because it handles this non-linearity without parametric assumptions. Result: 26% Brier score improvement.
Why fractional Kelly with caps?
Full Kelly criterion is mathematically optimal but assumes perfect edge estimation — which no model has. I used fractional Kelly (conservative multiplier) plus a hard 30% market divergence cap to protect against model overconfidence. The principle: start conservative, increase sizing only as evidence accumulates.
Why 3σ thresholds for trade entry?
Only trade when the detected edge exceeds 3 standard deviations from historical model error. This dramatically reduces false positives at the cost of trade frequency — a deliberate tradeoff favoring precision over volume. In practice, this meant fewer but higher-conviction trades.
Results
Performance Over Time
Reflection
The most important lesson: the model wasn’t the hard part — the execution layer was. I spent two weeks trying to improve forecast accuracy before realizing the edge was being lost to position sizing and entry timing. Rebuilding the execution layer without touching the model turned a losing system into a profitable one.