Skip to content

Automated leaderboard for real-time LLM performance tracking | Daily updates | Data-driven decisions

Notifications You must be signed in to change notification settings

chenjy16/modelrank_ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelRank AI 🏆

This is an automatically updated open-source large language model leaderboard with data sourced from HuggingFace. Through this project, you can easily view and compare the performance of various large language models.

Project Features

  • 🔄 Automatic Updates: Automatically fetches the latest model evaluation data from HuggingFace daily via GitHub Actions
  • 📊 Complete Data: Provides comprehensive leaderboard data, including model names, parameter counts, and various evaluation scores
  • 📱 Responsive Design: Supports viewing leaderboard data on various devices
  • 🔍 Search and Sort: Supports searching and sorting by different metrics on the complete leaderboard page
  • 📥 Data Download: Provides data in JSON and CSV formats for download

🏆 ModelRank AI Leaderboard

Last updated: 2025-10-27 01:10:03 UTC

Rank Model Average Score Parameters(B) IFEval BBH MATH GPQA MUSR MMLU-PRO
1 MaziyarPanahi/calme-3.2-instruct-78b 📑 52.08 78.0 80.63 62.61 40.33 20.36 38.53 70.03
2 MaziyarPanahi/calme-3.1-instruct-78b 📑 51.29 78.0 81.36 62.41 39.27 19.46 36.50 68.72
3 dfurman/CalmeRys-78B-Orpo-v0.1 📑 51.23 78.0 81.63 61.92 40.63 20.02 36.37 66.80
4 MaziyarPanahi/calme-2.4-rys-78b 📑 50.77 78.0 80.11 62.16 40.71 20.36 34.57 66.69
5 huihui-ai/Qwen2.5-72B-Instruct-abliterated 📑 48.11 72.7 85.93 60.49 60.12 19.35 12.34 50.41
6 Qwen/Qwen2.5-72B-Instruct 📑 47.98 72.7 86.38 61.87 59.82 16.67 11.74 51.40
7 MaziyarPanahi/calme-2.1-qwen2.5-72b 📑 47.86 72.7 86.62 61.66 59.14 15.10 13.30 51.32
8 newsbang/Homer-v1.0-Qwen2.5-72B 📑 47.46 72.7 76.28 62.27 49.02 22.15 17.90 57.17
9 ehristoforu/qwen2.5-test-32b-it 📑 47.37 32.8 78.89 58.28 59.74 15.21 19.13 52.95
10 Saxo/Linkbricks-Horizon-AI-Avengers-V1-32B 📑 47.34 32.8 79.72 57.63 60.27 14.99 18.16 53.25
11 MaziyarPanahi/calme-2.2-qwen2.5-72b 📑 47.22 72.7 84.77 61.80 58.91 14.54 12.02 51.31
12 fluently-lm/FluentlyLM-Prinum 📑 47.22 32.8 80.90 59.48 54.00 18.23 17.26 53.42
13 JungZoona/T3Q-Qwen2.5-14B-Instruct-1M-e3 📑 47.09 0.0 73.24 65.47 28.63 22.26 38.69 54.27
14 JungZoona/T3Q-qwen2.5-14b-v1.0-e3 📑 47.09 14.8 73.24 65.47 28.63 22.26 38.69 54.27
15 zetasepic/Qwen2.5-32B-Instruct-abliterated-v2 📑 46.89 32.8 83.34 56.53 59.52 15.66 14.93 51.35
16 rubenroy/Gilgamesh-72B 📑 46.79 72.7 84.86 61.84 43.81 19.24 17.66 53.36
17 Sakalti/ultiima-72B 📑 46.77 72.7 71.40 61.10 53.55 21.92 18.12 54.51
18 CombinHorizon/zetasepic-abliteratedV2-Qwen2.5-32B-Inst-BaseMerge-TIES 📑 46.76 32.8 83.28 56.83 58.53 15.66 14.22 52.05
19 maldv/Awqward2.5-32B-Instruct 📑 46.75 32.8 82.55 57.21 62.31 12.08 13.87 52.48
20 raphgg/test-2.5-72B 📑 46.74 72.7 84.37 62.15 41.09 18.57 20.52 53.74

Complete Data

The complete leaderboard data can be viewed through the following methods:

Evaluation Metrics Explanation

The leaderboard includes the following main evaluation metrics:

  • Average ⬆️: Average score of all evaluations
  • IFEval: Instruction following capability evaluation
  • BBH: Big-Bench Hard benchmark for large language models
  • MATH Lvl 5: Mathematical problem-solving capability evaluation
  • GPQA: General Physics Question Answering evaluation
  • MUSR: Multi-step reasoning evaluation
  • MMLU-PRO: Massive Multitask Language Understanding Professional version evaluation

Local Development

Prerequisites

  • Python 3.10+
  • HuggingFace API token

Installation Steps

  1. Clone the repository
    git clone https://github.com/chenjy16/modelrank_ai.git
    cd modelrank_ai

License

This project is open-sourced under the MIT License.

Data Source

Data is sourced from HuggingFace.

About

Automated leaderboard for real-time LLM performance tracking | Daily updates | Data-driven decisions

Resources

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages