LLMs predict the 2026 World Cup

Pre-tournament forecasts in
All four models have submitted a full 48-team forecast from the same context pack. The next update lands after the group stage. Last updated: 8 June 2026.

The 2026 World Cup kicks off this week. Forty-eight teams across the United States, Canada, and Mexico, an expanded group stage, and a knockout bracket that runs through 19 July.

Before a ball is kicked, I want to know what frontier language models think will happen, and I want a record of those predictions that I can hold up against reality once the tournament is over.

This post is that record. The first version goes up now with the pre-tournament forecasts. After each round I will add a dated section, score the models against what actually happened, and write a post-mortem at the end.

Open Table of contents

What I gave them
Where they agreed
Where they disagreed
How I will score them
Live updates

What I gave them

Four models, each scored on its own:

Claude Opus 4.7 (Anthropic)
Claude Sonnet 4.6 (Anthropic)
GPT-4o (OpenAI)
Gemini 2.5 Pro (Google)

Each model got the same context pack: 48 team dossiers, the official Group A to L draw, squad ratings, manager bios, recent qualifying form, injury status, betting-market odds, and head-to-head history. No predictions were written into the pack.

The rule was strict: no web, no tools, no live data. Each model got the same instruction and returned a structured forecast. That forecast covered a champion probability for all 48 teams, a per-stage probability for each team (group winner, round of 16, quarter, semi, final, champion), and one most-likely winner per group.

I also ran a separate pass where each model predicted the exact scoreline of all 72 group-stage matches, which gives a full set of standings and a game-by-game card for every group.

Full predictions
Every group's predicted standings and all 72 match scorelines from the four models, side by side: see the full group-stage predictions →

Where they agreed

Across all four models, Spain is the favourite. The mean champion probability is 14.9%, and no model puts the leader above 16.0%. For an open 48-team tournament that is sensible: a well-calibrated forecast should not put any single team much above 18 to 20%, and the four models stay well inside that band.

Here is the top 12 by mean champion probability, with each model’s number alongside:

Rank	Team	Opus 4.7	Sonnet 4.6	GPT-4o	Gemini 2.5 Pro	Mean
1	Spain	15.5%	16.0%	13.8%	15.5%	15.2%
2	France	14.5%	14.9%	12.1%	14.5%	14.0%
3	England	11.5%	11.9%	11.6%	13.4%	12.1%
4	Argentina	10.5%	10.8%	11.2%	10.3%	10.7%
5	Brazil	9.5%	8.8%	9.9%	7.2%	8.9%
6	Portugal	7.0%	6.7%	7.3%	8.2%	7.3%
7	Germany	6.0%	9.3%	6.0%	6.2%	6.9%
8	Netherlands	5.0%	4.6%	3.9%	5.1%	4.7%
9	Belgium	3.0%	2.6%	3.0%	2.1%	2.7%
10	Colombia	1.8%	1.9%	1.7%	2.6%	2.0%
11	Uruguay	1.5%	1.5%	1.3%	1.5%	1.5%
12	Croatia	1.5%	0.9%	1.7%	1.5%	1.4%

The top six are the same six teams in all four forecasts: Spain, France, England, Argentina, Brazil, Portugal. Together they make up about two-thirds of the championship mass on every model.

Group winners agreed even more sharply. Here is each model’s most-likely pick per group:

Group	Opus 4.7	Sonnet 4.6	GPT-4o	Gemini 2.5 Pro
A	Mexico (52%)	Mexico (52%)	Mexico (58%)	Mexico (55%)
B	Switzerland (54%)	Switzerland (56%)	Switzerland (57%)	Switzerland (58%)
C	Brazil (74%)	Brazil (78%)	Brazil (82%)	Brazil (82%)
D	USA (42%)	Turkey (38%)	USA (46%)	Turkey (38%)
E	Germany (62%)	Germany (68%)	Germany (67%)	Germany (70%)
F	Netherlands (52%)	Netherlands (62%)	Netherlands (60%)	Netherlands (65%)
G	Belgium (58%)	Belgium (58%)	Belgium (65%)	Belgium (72%)
H	Spain (74%)	Spain (80%)	Spain (81%)	Spain (80%)
I	France (66%)	France (78%)	France (78%)	France (78%)
J	Argentina (72%)	Argentina (78%)	Argentina (80%)	Argentina (75%)
K	Portugal (62%)	Portugal (65%)	Portugal (70%)	Portugal (68%)
L	England (70%)	England (72%)	England (74%)	England (76%)

Eleven of twelve groups have unanimous agreement on the most-likely winner. Group D is the only fork.

The match level is the same. Each model also gave a probability for all 72 group-stage matches, and all four pick the same most-likely outcome in 69 of them.

Where they disagreed

Two real splits.

Group D: USA vs Türkiye. Opus and GPT-4o give the group to the host USA (42% and 46%). Sonnet and Gemini give it to Türkiye (both 38%). The pack has Türkiye carrying the higher squad average, with Çalhanoğlu, Güler, and Yıldız at the core, but the USA has Pochettino and the home crowd. Two models weighted squad quality higher; the other two weighted host advantage.

Brazil vs England as the third elite. GPT-4o ranks England third (11.6%) above Brazil (9.9%). Gemini puts England third at 13.4% and Brazil sixth at 7.2%, the largest single gap in the panel. Opus and Sonnet keep Brazil close behind England but still behind. Gemini is the main driver of this split: it has the strongest pro-England lean and the strongest doubt about Brazil.

How I will score them

Two things, and they should not be confused.

The first is agreement, which I have already scored: 11 of 12 group winners are unanimous, 69 of 72 group-stage match outcomes are unanimous, and all four models put Spain on top between 13.8% and 16.0%. That tells me the models share a consensus. It does not tell me the consensus is right.

The second is accuracy, which I will score as the tournament plays out. The winner, the runner-up, the Golden Boot, and third place resolve at the final. The group winners resolve after the group stage. The full set of 72 match probabilities lets me score each model on a much larger sample than the headline picks, which is what I actually care about.

I would also like to track calibration: when a model says “Spain 15.5% to win”, is that number reliable? A single tournament cannot answer that fully, but scoring the 72 group-stage match probabilities against the real results gives me a reliability check per model, and that is the most useful output of the whole exercise.

Live updates

The plan for each round:

Group stage: score the 72 match outcomes, score each model on the full set of match probabilities, and note where group winners surprised the consensus, including the Group D fork.
Round of 16: flag any model whose knockout bracket is already dead, and score the round-of-16 probabilities.
Quarter-finals: track which model has the best surviving picks.
Semis and final: winner and third place resolve here. Score each model’s champion forecast against the actual winner.
Post-mortem: full scorecard, what each model got right, where the consensus was most wrong, and whether the four-model panel beat the betting market.

The pre-tournament forecasts ship today. The first round-of-16 update lands shortly after 27 June.