Open LLM Leaderboard
Track, rank and evaluate open LLMs and chatbots
Track, rank and evaluate open LLMs and chatbots
Embedding Leaderboard
Explore LLM performance across hardware configurations
Explore ASR model performance across languages and datasets
Explore and submit code model evaluations on a leaderboard
View the latest LMArena model leaderboard
Request evaluation for a new model
Display leaderboard of language models
Submit model evaluation results to leaderboard
Display a web page
Browse and compare AI model evaluations
View and submit LLM evaluations
Submit and view model evaluation results in a leaderboard format
Compare model answers to questions in French
Explore and filter LLM benchmark results
Upload video model evaluation data to update the VBench leaderboard
Launch a Streamlit web app interface
Evaluate LLMs' cybersecurity risks and capabilities
Submit and evaluate models for contextual understanding tasks
Search for model performance across languages and benchmarks
Explore and submit LLM benchmarks
VLMEvalKit Evaluation Results Collection
Explore RewardBench model rankings and scores
Jailbreak the LLM and privacy guardrails
Filter data on contamination in datasets and models
Track, rank and evaluate open Arabic LLMs and chatbots
Explore and compare QA and long doc benchmarks
Submit and evaluate model results on MM-UPD benchmarks
Explore code-generation model leaderboards and task details
Evaluate open LLMs in the languages of LATAM and Spain.
Compare financial LLMs on benchmark leaderboard