Real people. Real conversations. Real rankings.

Showdown ranks AI models based on how they perform in real-world use— not synthetic tests or lab settings. Votes are blind, optional, and organic, so rankings reflect authentic preferences.Methodology & Technical Report→

0 promptsReal conversation prompts compared across models through pairwise votes.

0 usersFrom 80+ countries and 70+ languages, spanning all backgrounds and professions.

RANK ↑

MODEL ↑↓

VOTES ↑↓

SCORE ↑↓

gpt-5-chat

15216

1094.40

-3.46 +3.23

claude-sonnet-4-5-20250929

9470

1091.64

-6.05 +3.84

claude-opus-4-1-20250805

16756

1082.60

-4.62 +3.82

qwen3-235b-a22b-2507-v1

6237

1082.13

-4.92 +5.58

claude-sonnet-4-20250514

20251

1069.75

-3.89 +3.01

claude-sonnet-4-5-20250929 (Thinking)

9161

1065.67

-4.67 +5

claude-opus-4-20250514

16442

1064.28

-4.31 +2.92

claude-haiku-4-5-20251001

5350

1060.63

-5.94 +6.83

gpt-4.1-2025-04-14

17752

1058.77

-3.63 +3.53

gemini-3-pro-preview

3307

1051.92

-7.17 +9.35

claude-opus-4-1-20250805 (Thinking)

15505

1053.02

-4.5 +3.91

gemini-2.5-pro

3840

1048.19

-6.67 +7.97

gemini-2.5-pro-preview-06-05

15357

1044.19

-4.6 +4.29

claude-opus-4-20250514 (Thinking)

16806

1040.20

-3.68 +3.3

claude-sonnet-4-20250514 (Thinking)

19782

1036.90

-3.56 +3.34

claude-haiku-4-5-20251001 (Thinking)

5121

1028.89

-5.73 +6.06

o3-2025-04-16-medium*

22266

1020.59

-2.76 +3.68

gemini-2.5-flash-preview-05-20

18363

1019.35

-3.47 +3.87

gemini-2.5-flash

3823

1018.48

-8.6 +10.11

llama4-maverick-instruct-basic

19169

1000.00

-3.89 +4.19

o4-mini-2025-04-16-medium*

21753

988.00

-3.32 +3.39

deepseek-r1-0528

6059

968.17

-4.88 +6.48

gpt-5-2025-08-07-medium*

15702

950.68

-3.73 +3.86

* This model’s API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.

Win Rate vs. Each Model

Battle Count vs. Each Model

Confidence Intervals

Average Win Rate

Prompt Distribution

Win Rate vs. Each Model

Battle Count vs. Each Model

Confidence

Average Win Rate

Prompt Distribution

Real people. Real conversations. Real rankings.

0 promptsReal conversation prompts compared across models through pairwise votes.

0 usersFrom 80+ countries and 70+ languages, spanning all backgrounds and professions.

RANK ↑

MODEL ↑↓

VOTES ↑↓

SCORE ↑↓

gpt-5-chat

15216

1094.40

-3.46 +3.23

claude-sonnet-4-5-20250929

9470

1091.64

-6.05 +3.84

claude-opus-4-1-20250805

16756

1082.60

-4.62 +3.82

qwen3-235b-a22b-2507-v1

6237

1082.13

-4.92 +5.58

claude-sonnet-4-20250514

20251

1069.75

-3.89 +3.01

claude-sonnet-4-5-20250929 (Thinking)

9161

1065.67

-4.67 +5

claude-opus-4-20250514

16442

1064.28

-4.31 +2.92

claude-haiku-4-5-20251001

5350

1060.63

-5.94 +6.83

gpt-4.1-2025-04-14

17752

1058.77

-3.63 +3.53

gemini-3-pro-preview

3307

1051.92

-7.17 +9.35

claude-opus-4-1-20250805 (Thinking)

15505

1053.02

-4.5 +3.91

gemini-2.5-pro

3840

1048.19

-6.67 +7.97

gemini-2.5-pro-preview-06-05

15357

1044.19

-4.6 +4.29

claude-opus-4-20250514 (Thinking)

16806

1040.20

-3.68 +3.3

claude-sonnet-4-20250514 (Thinking)

19782

1036.90

-3.56 +3.34

claude-haiku-4-5-20251001 (Thinking)

5121

1028.89

-5.73 +6.06

o3-2025-04-16-medium*

22266

1020.59

-2.76 +3.68

gemini-2.5-flash-preview-05-20

18363

1019.35

-3.47 +3.87

gemini-2.5-flash

3823

1018.48

-8.6 +10.11

llama4-maverick-instruct-basic

19169

1000.00

-3.89 +4.19

o4-mini-2025-04-16-medium*

21753

988.00

-3.32 +3.39

deepseek-r1-0528

6059

968.17

-4.88 +6.48

gpt-5-2025-08-07-medium*

15702

950.68

-3.73 +3.86

* This model’s API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.

Win Rate vs. Each Model

Battle Count vs. Each Model

Confidence Intervals

Average Win Rate

Showdown Leaderboard - LLMs

Real people. Real conversations. Real rankings.

SEAL Leaderboard - LLMs

gpt-5-chat

claude-sonnet-4-5-20250929

claude-opus-4-1-20250805

qwen3-235b-a22b-2507-v1

claude-sonnet-4-20250514

claude-sonnet-4-5-20250929 (Thinking)

claude-opus-4-20250514

claude-haiku-4-5-20251001

gpt-4.1-2025-04-14

gemini-3-pro-preview

claude-opus-4-1-20250805 (Thinking)

gemini-2.5-pro

gemini-2.5-pro-preview-06-05

claude-opus-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)

claude-haiku-4-5-20251001 (Thinking)

o3-2025-04-16-medium

gemini-2.5-flash-preview-05-20

gemini-2.5-flash

llama4-maverick-instruct-basic

o4-mini-2025-04-16-medium

deepseek-r1-0528

gpt-5-2025-08-07-medium

Performance Comparison Across Language Models

Win Rate vs. Each Model

Win Rate vs Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution

Showdown Leaderboard - LLMs

Real people. Real conversations. Real rankings.

SEAL Leaderboard - LLMs

gpt-5-chat

claude-sonnet-4-5-20250929

claude-opus-4-1-20250805

qwen3-235b-a22b-2507-v1

claude-sonnet-4-20250514

claude-sonnet-4-5-20250929 (Thinking)

claude-opus-4-20250514

claude-haiku-4-5-20251001

gpt-4.1-2025-04-14

gemini-3-pro-preview

claude-opus-4-1-20250805 (Thinking)

gemini-2.5-pro

gemini-2.5-pro-preview-06-05

claude-opus-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)

claude-haiku-4-5-20251001 (Thinking)

o3-2025-04-16-medium

gemini-2.5-flash-preview-05-20

gemini-2.5-flash

llama4-maverick-instruct-basic

o4-mini-2025-04-16-medium

deepseek-r1-0528

gpt-5-2025-08-07-medium

Performance Comparison Across Language Models

Win Rate vs. Each Model

Win Rate vs Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution