Methodology
How ForecastMind collects data, computes calibration metrics, and matches markets across venues. This page is intended for researchers and journalists who want to understand and cite our data.
1. Data Collection
ForecastMind pulls live market data from multiple venues every 5 minutes and stores snapshots in a DuckDB database. The following venues are currently tracked:
| Venue | API | Auth | Polling |
|---|---|---|---|
| Polymarket | Gamma API + CLOB | None | 5 min |
| Kalshi | Elections API | None | On divergence check |
| Manifold | v0 REST API | None | On divergence check |
| PredictIt | Market data API | None | On divergence check |
| Metaculus | api2 | Token (free) | On consensus check |
Each snapshot stores: market_id, yes_price, no_price, volume_24h, liquidity, spread, best_bid, best_ask, and a Unix timestamp. Data is retained indefinitely.
Market resolutions are recorded when Polymarket marks a market as resolved via their Gamma API. The final yes_price snapshot before the resolution timestamp is used as the market's last price for calibration.
2. Calibration Metrics
Calibration measures how well a prediction market's probabilities correspond to actual outcome frequencies. A perfectly calibrated market would resolve YES exactly 70% of the time for markets priced at 70%.
Brier Score
BS = (1/N) × Σ (predicted_probability − actual_outcome)²
Where actual_outcome = 1.0 for YES, 0.0 for NO. Range: 0 (perfect) to 1 (perfectly wrong). Random guessing on binary outcomes yields BS = 0.25.
Mean Absolute Error (MAE)
MAE = (1/N) × Σ |predicted_probability − actual_outcome|
Simpler than Brier Score; less sensitive to extreme errors. Used for cross-venue comparisons where only one venue's price is available.
Reliability diagrams (also called calibration curves) group markets into 10 probability buckets (0–10%, 10–20%, etc.) and plot the actual YES resolution rate against the bucket midpoint. A perfectly calibrated market would produce a diagonal line from (0, 0) to (1, 1).
Category filters apply to all metrics: Brier scores, MAE, and calibration curves can all be filtered by the Polymarket category tag (politics, crypto, sports, etc.).
Cross-venue comparisons require that the same real-world event was tracked on both venues AND subsequently resolved. Only markets matched via divergence detection (see Venue Matching) are included. The last recorded price on each venue before the resolution timestamp is used.
3. Venue Matching (Entity Resolution)
Matching the same event across venues is the hardest part of cross-venue analysis. ForecastMind uses a two-stage hybrid approach:
- Jaccard similarity on tokenized question text. Stop words, common prediction market phrasing ("will", "by", "in"), and magnitude tokens (bare numbers, bps, percentages) are stripped. Similarity is computed as:|A ∩ B| / |A ∪ B|
- Embedding similarity (tiebreaker for borderline cases). For pairs scoring 0.15–0.55 on Jaccard, a cosine similarity check using
all-MiniLM-L6-v2(384-dimensional sentence embeddings) is applied. Pairs must score ≥ 0.76 cosine to be accepted by the embedding path.
| Use case | Min score | Notes |
|---|---|---|
| Divergence alerts | 0.50 | Raised 2026-03-12 after false-match audit |
| Cross-venue signals | 0.55 | Per-market panel |
| Consensus (Metaculus) | 0.57 | Established standard |
| Canonical entity layer | 0.57 | Persistent cross-venue links |
Matched pairs are stored as canonical entities with stable UUIDs in the canonical_events table. Each entity accumulates venue aliases over time as new matches are confirmed.
4. Coverage & Limitations
Resolution coverage: ForecastMind records resolutions as they are reported by Polymarket's Gamma API. Markets that resolve without a price snapshot in the database (e.g. markets not seen during polling intervals) will have no calibration entry. Coverage is typically >95% for markets active within 7 days of resolution.
Cross-venue coverage: Limited to events that triggered a divergence alert (i.e. where the price gap exceeded the threshold at any point). Events that never diverged are not in the cross-venue comparison set. This introduces selection bias — divergence events may be systematically harder to predict.
Kalshi access: Only api.elections.kalshi.com is accessible from ForecastMind's servers. This limits Kalshi coverage to long-dated political markets (US/UK elections, leadership races). Kalshi sports and economics markets are not tracked.
Start date: Continuous data collection began in early 2026. Markets that resolved before the collection start date are not in the calibration database.
5. Citation Format
For academic papers, reports, or journalism citing ForecastMind data:
APA
ForecastMind. (2026). ForecastMind prediction market calibration database. Retrieved [date], from https://forecastmind.org/calibration
For a specific market
ForecastMind. (2026). [Market question]. Retrieved [date], from https://forecastmind.org/markets/[slug]
For a monthly report
ForecastMind. (2026). State of prediction market accuracy — [Month Year]. Retrieved [date], from https://forecastmind.org/reports/[YYYY]/[MM]
If you use ForecastMind data in published research, we'd appreciate a brief note at research@forecastmind.org. We track citations to improve data quality for the research community.