Real people. Real conversations. Real rankings.

Showdown ranks AI models based on how they perform in real-world use— not synthetic tests or lab settings. Votes are blind, optional, and organic, so rankings reflect authentic preferences.Methodology & Technical Report→

0 promptsReal conversation prompts compared across models through pairwise votes.

0 usersFrom 80+ countries and 70+ languages, spanning all backgrounds and professions.

RANK ↑

MODEL ↑↓

VOTES ↑↓

SCORE ↑↓

gpt-5-chat

11212

1087.57

-4.38 +6.03

claude-sonnet-4-5-20250929

9400

1082.08

-4.57 +5.68

qwen3-235b-a22b-2507-v1

9100

1076.77

-3.67 +4.13

claude-opus-4-1-20250805

12282

1072.51

-5.25 +4.57

claude-sonnet-4-20250514

16947

1063.25

-4.2 +3.72

claude-sonnet-4-5-20250929 (Thinking)

9101

1058.47

-3.7 +5.23

claude-haiku-4-5-20251001

4027

1055.31

-7.14 +6.67

gemini-3-pro-preview

5302

1042.65

-6.55 +6.76

claude-opus-4-1-20250805 (Thinking)

10617

1042.63

-4.79 +5.23

gemini-2.5-pro

7080

1036.50

-6.32 +6.18

claude-haiku-4-5-20251001 (Thinking)

3875

1031.01

-6.9 +6.81

claude-sonnet-4-20250514 (Thinking)

13829

1028.26

-3.25 +3.4

o3-2025-04-16-medium*

17747

1015.69

-3.18 +3.79

gemini-2.5-flash

7477

1011.01

-5.78 +6.22

llama4-maverick-instruct-basic

12425

1000.00

-5.25 +5.04

o4-mini-2025-04-16-medium*

17534

985.55

-3.46 +4.31

deepseek-r1-0528

7039

961.23

-6.32 +7.14

gpt-5-2025-08-07-medium*

14112

945.21

-3.34 +4.64

* This model’s API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.

Win Rate vs. Each Model

Battle Count vs. Each Model

Confidence Intervals

Average Win Rate

Prompt Distribution

Win Rate vs. Each Model

Battle Count vs. Each Model

Confidence

Average Win Rate

Prompt Distribution

Real people. Real conversations. Real rankings.

0 promptsReal conversation prompts compared across models through pairwise votes.

0 usersFrom 80+ countries and 70+ languages, spanning all backgrounds and professions.

RANK ↑

MODEL ↑↓

VOTES ↑↓

SCORE ↑↓

gpt-5-chat

11212

1087.57

-4.38 +6.03

claude-sonnet-4-5-20250929

9400

1082.08

-4.57 +5.68

qwen3-235b-a22b-2507-v1

9100

1076.77

-3.67 +4.13

claude-opus-4-1-20250805

12282

1072.51

-5.25 +4.57

claude-sonnet-4-20250514

16947

1063.25

-4.2 +3.72

claude-sonnet-4-5-20250929 (Thinking)

9101

1058.47

-3.7 +5.23

claude-haiku-4-5-20251001

4027

1055.31

-7.14 +6.67

gemini-3-pro-preview

5302

1042.65

-6.55 +6.76

claude-opus-4-1-20250805 (Thinking)

10617

1042.63

-4.79 +5.23

gemini-2.5-pro

7080

1036.50

-6.32 +6.18

claude-haiku-4-5-20251001 (Thinking)

3875

1031.01

-6.9 +6.81

claude-sonnet-4-20250514 (Thinking)

13829

1028.26

-3.25 +3.4

o3-2025-04-16-medium*

17747

1015.69

-3.18 +3.79

gemini-2.5-flash

7477

1011.01

-5.78 +6.22

llama4-maverick-instruct-basic

12425

1000.00

-5.25 +5.04

o4-mini-2025-04-16-medium*

17534

985.55

-3.46 +4.31

deepseek-r1-0528

7039

961.23

-6.32 +7.14

gpt-5-2025-08-07-medium*

14112

945.21

-3.34 +4.64

* This model’s API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.

Win Rate vs. Each Model

Battle Count vs. Each Model

Confidence Intervals

Average Win Rate

Showdown Leaderboard - LLMs

Real people. Real conversations. Real rankings.

SEAL Leaderboard - LLMs

gpt-5-chat

claude-sonnet-4-5-20250929

qwen3-235b-a22b-2507-v1

claude-opus-4-1-20250805

claude-sonnet-4-20250514

claude-sonnet-4-5-20250929 (Thinking)

claude-haiku-4-5-20251001

gemini-3-pro-preview

claude-opus-4-1-20250805 (Thinking)

gemini-2.5-pro

claude-haiku-4-5-20251001 (Thinking)

claude-sonnet-4-20250514 (Thinking)

o3-2025-04-16-medium

gemini-2.5-flash

llama4-maverick-instruct-basic

o4-mini-2025-04-16-medium

deepseek-r1-0528

gpt-5-2025-08-07-medium

Performance Comparison Across Language Models

Win Rate vs. Each Model

Win Rate vs Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution

Showdown Leaderboard - LLMs

Real people. Real conversations. Real rankings.

SEAL Leaderboard - LLMs

gpt-5-chat

claude-sonnet-4-5-20250929

qwen3-235b-a22b-2507-v1

claude-opus-4-1-20250805

claude-sonnet-4-20250514

claude-sonnet-4-5-20250929 (Thinking)

claude-haiku-4-5-20251001

gemini-3-pro-preview

claude-opus-4-1-20250805 (Thinking)

gemini-2.5-pro

claude-haiku-4-5-20251001 (Thinking)

claude-sonnet-4-20250514 (Thinking)

o3-2025-04-16-medium

gemini-2.5-flash

llama4-maverick-instruct-basic

o4-mini-2025-04-16-medium

deepseek-r1-0528

gpt-5-2025-08-07-medium

Performance Comparison Across Language Models

Win Rate vs. Each Model

Win Rate vs Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution