leaderboards - a MoritzLaurer Collection

MoritzLaurer 's Collections

prompt-templates

Zeroshot Classifiers

other-interesting

code generation

leaderboards

updated Mar 2

Running

4.95k

Arena Leaderboard

🏆

4.95k

View the LMArena leaderboard in full‑screen
Running on CPU Upgrade

14k

Open LLM Leaderboard

🏆

14k

Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade

7.58k

MTEB Leaderboard

📊

7.58k

Embedding Leaderboard
Running on CPU Upgrade

Agents

Featured

1.41k

Open ASR Leaderboard

🏆

1.41k

Explore and compare speech‑recognition model benchmarks
Running

Agents

Featured

589

LLM-Perf Leaderboard

🏆

589

Compare LLM hardware performance and find the best model
Running

Agents

1.51k

Big Code Models Leaderboard

📈

1.51k

Explore code model leaderboard and submit evaluations
Runtime error

Agents

78

Human & GPT-4 Evaluation of LLMs Leaderboard

👩

78
Runtime error

Agents

145

Hallucinations Leaderboard

🔥

145

View and submit LLM evaluations
Build error

Agents

105

Enterprise Scenarios Leaderboard

🥇

105
Running on CPU Upgrade

Agents

92

LLM Safety Leaderboard

🥇

92

Search, filter and submit LLM benchmark evaluations
Running

Featured

561

Vision Arena (Testing VLMs side-by-side)

🖼

561

Explore Vision Arena visual AI demo online
Running

72

CyberSecEvalTest

📈

72

Evaluate LLMs' cybersecurity risks and capabilities
Running

Featured

471

LLM Performance Leaderboard

🐨

471

View the LLM leaderboard rankings
Running on CPU Upgrade

Agents

77

AIR-Bench Leaderboard

🥇

77

Explore and compare QA and long doc benchmarks
Running on CPU Upgrade

Agents

1.02k

Open VLM Leaderboard

🌎

1.02k

VLMEvalKit Evaluation Results Collection
Running

Agents

432

Reward Bench Leaderboard

📐

432

Explore and compare model scores on RewardBench benchmarks
Running

Agents

232

BigCodeBench Leaderboard

🥇

232

Explore code-generation model leaderboards and task details
Runtime error

Agents

10

MJ Bench Leaderboard

🥇

10

Display and filter multimodal model leaderboard results
Running

116

MTEB Arena

⚔

116

Display MTEB Arena interface
Runtime error

Agents

Featured

151

Open LLM Progress Tracker

🔬

151

Visualize Open vs. Proprietary LLM Progress
Running

Agents

112

Judge Arena

💻

112

View and compare open‑source AI model rankings with ELO scores
Running

Agents

485

TTS Spaces Arena

🤗

485

Blind vote on HF TTS models!
Runtime error

Featured

142

smolagents LLM leaderboard

🏆

142

A leaderboard for LLMs powering smolagents