
Rank | T | Model Name | Average ⬆️ | AlGhafa | ArabicMMLU | EXAMS | MadinahQA | AraTrust | ALRAGE | ArbMMLU-HT |
---|---|---|---|---|---|---|---|---|---|---|
100 | 💬 | Isaak-Carter/Josiefied-Qwen2.5-7B-Instruct-abliterated-v2 | 75.59 | 78.21 | 75.05 | 59.22 | 75.34 | 89.83 | 77.62 | 73.87 |
Submit Your Model for Evaluation 🌴
The Open Arabic LLM Leaderboard aims to help you evaluate and compare the performance of Arabic Large Language Models.
When you submit a model on this page, it is automatically evaluated on a set of arabic native benchmarks (find here) with one additional human-translated version of MMLU.
The GPU used for evaluation is operated with the support of Technology Innovation Institute (TII).
More details about the benchmarks and the evaluation process is provided on the “About” section below.
Find the first version of the leaderboard hosted as Legacy in this Space.
Evaluation Status
No pending evaluations.
No running evaluations.
model | base_model | revision | precision | weight_type | status | submitted_time | model_type | likes | params | license | private | job_id | job_start_time | chat_template |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Isaak-Carter/Josiefied-Qwen2.5-7B-Instruct-abliterated-v2 | 060db6499f32faf8b98477b0a26969ef7d8b9987 | bfloat16 | Original | FINISHED | 2025-03-04T09:59:59.907551Z | 🔶 : fine-tuned on domain-specific datasets | 3693 | 34.389 | cc-by-nc-sa-4.0 | false | 2025-01-21T07:40:09.877337 | false |
model | base_model | revision | precision | weight_type | status | submitted_time | model_type | likes | params | license | private | job_id | job_start_time | chat_template |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01-ai/Yi-1.5-34B-32K | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🟩 : continuously pretrained | 13 | 34.389 | apache-2.0 | false | 2025-01-21T07:40:09.877337 | false | ||
01-ai/Yi-1.5-9B-32K | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 🟩 : continuously pretrained | 7 | 8.829 | apache-2.0 | false | 2025-02-08T11:10:21.161772 | false | ||
01-ai/Yi-1.5-9B | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 🟩 : continuously pretrained | 24 | 8.829 | apache-2.0 | false | 2025-02-08T08:29:36.805940 | false | ||
AIDC-AI/Marco-LLM-AR-V2 | main | bfloat16 | Original | FINISHED | 2025-03-04T09:59:59.907551Z | 🟩 : continuously pretrained | 1 | 7.616 | apache-2.0 | false | 2025-03-04T10:18:02.845586 | true | ||
AIDC-AI/Marco-LLM-AR-V3 | main | bfloat16 | Original | FINISHED | 2025-03-04T19:15:00.114106Z | 🟩 : continuously pretrained | 0 | 7.616 | apache-2.0 | false | 2025-03-04T19:19:53.973663 | true | ||
AIDC-AI/Marco-LLM-AR | main | bfloat16 | Original | FINISHED | 2025-02-11T06:35:54.938982Z | 🟩 : continuously pretrained | 0 | 7.616 | apache-2.0 | false | 2025-02-12T05:20:03.501821 | true | ||
ALLaM-AI/ALLaM-7B-Instruct-preview | main | bfloat16 | Original | FINISHED | 2025-02-18T18:01:56.806782Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 5 | 7 | apache-2.0 | false | 2025-02-19T09:45:10.993265 | true | ||
AXCXEPT/EZO-Qwen2.5-32B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 🔶 : fine-tuned on domain-specific datasets | 2 | 32.764 | apache-2.0 | false | 2025-02-08T08:30:19.518006 | true | ||
CohereForAI/aya-23-35B | main | float16 | Original | FINISHED | 2025-01-29T15:57:52.890641Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 268 | 34.981 | cc-by-nc-4.0 | false | 2025-02-08T19:27:01.105080 | true | ||
CohereForAI/aya-expanse-32b | main | float16 | Original | FINISHED | 2025-01-18T11:14:25Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 202 | 32.3 | CC-BY-NC-4.0 | false | 2025-02-08T08:38:20.150902 | true | ||
Cran-May/T.E-8.1 | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 2 | 7.616 | cc-by-nc-sa-4.0 | false | 2025-02-08T10:30:21.401321 | true | ||
Daemontatox/Cogito-R1 | main | bfloat16 | Original | FINISHED | 2025-02-11T20:46:56.174519Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 1 | 32.764 | apache-2.0 | false | 2025-02-11T20:48:32.217184 | true | ||
FreedomIntelligence/AceGPT-13B-chat | main | float16 | Original | FINISHED | 2025-01-20T17:11:55Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 25 | 13 | apache-2.0 | false | 2025-02-08T13:18:22.602821 | false | ||
FreedomIntelligence/AceGPT-13B | main | float16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🟩 : continuously pretrained | 8 | 13 | apache-2.0 | false | 2025-02-08T10:22:21.514187 | false | ||
FreedomIntelligence/AceGPT-7B-chat | main | float16 | Original | FINISHED | 2025-01-20T17:11:55Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 8 | 7 | apache-2.0 | false | 2025-02-08T13:26:22.687068 | false | ||
FreedomIntelligence/AceGPT-7B | main | float16 | Original | FINISHED | 2025-01-20T17:11:55Z | 🟩 : continuously pretrained | 3 | 7 | apache-2.0 | false | 2025-02-08T13:10:22.229891 | false | ||
FreedomIntelligence/AceGPT-v2-32B-Chat | main | float16 | Original | FINISHED | 2025-01-18T11:14:25Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 3 | 32.513 | apache-2.0 | false | 2025-02-08T08:46:19.920445 | true | ||
FreedomIntelligence/AceGPT-v2-32B | main | float16 | Original | FINISHED | 2025-01-30T06:27:43.852438Z | 🟢 : pretrained | 1 | 32.513 | apache-2.0 | false | 2025-02-08T19:43:00.764839 | true | ||
FreedomIntelligence/AceGPT-v2-70B-Chat | main | float16 | Original | FINISHED | 2025-01-22T12:22:56Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 0 | 70.554 | apache-2.0 | false | 2025-02-06T07:50:32.889557 | false | ||
FreedomIntelligence/AceGPT-v2-70B | main | float16 | Original | FINISHED | 2025-01-22T12:22:56Z | 🟩 : continuously pretrained | 1 | 70.554 | apache-2.0 | false | 2025-02-06T07:45:35.631469 | false | ||
FreedomIntelligence/AceGPT-v2-8B-Chat | main | float16 | Original | FINISHED | 2025-01-20T17:11:55Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 2 | 8.03 | apache-2.0 | false | 2025-02-08T19:19:00.154919 | false | ||
INSAIT-Institute/BgGPT-7B-Instruct-v0.2 | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 🟩 : continuously pretrained | 21 | 7.291 | apache-2.0 | false | 2025-02-08T06:22:15.130408 | true | ||
Isaak-Carter/Josiefied-Qwen2.5-7B-Instruct-abliterated-v2 | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 🔶 : fine-tuned on domain-specific datasets | 1 | 7.616 | apache-2.0 | false | 2025-02-08T06:32:15.877524 | true | ||
MaziyarPanahi/calme-2.1-qwen2.5-72b | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 1 | 72.7 | other | false | 2025-02-06T06:15:31.290181 | true | ||
MaziyarPanahi/calme-2.2-qwen2.5-72b | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 1 | 72.7 | other | false | 2025-01-21T07:40:39.881897 | true | ||
Navid-AI/Yehia-7B-preview | main | bfloat16 | Original | FINISHED | 2025-03-03T11:06:48.245215Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 4 | 7.001 | false | 2025-03-03T15:20:20.030168 | true | |||
Orion-zhen/Qwen2.5-7B-Instruct-Uncensored | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 3 | 7.616 | gpl-3.0 | false | 2025-02-08T08:22:16.007188 | true | ||
Qwen/Qwen1.5-1.8B | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🟢 : pretrained | 35 | 1.837 | other | false | 2025-02-08T10:54:21.072378 | true | ||
Qwen/Qwen1.5-14B | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 🟢 : pretrained | 32 | 14.167 | other | false | 2025-02-08T09:10:20.446684 | true | ||
Qwen/Qwen1.5-32B | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 🟢 : pretrained | 71 | 32.512 | other | false | 2025-02-08T08:54:20.154034 | true | ||
Qwen/Qwen1.5-4B | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 🟢 : pretrained | 29 | 3.95 | other | false | 2025-02-08T07:52:15.475167 | true | ||
Qwen/Qwen1.5-7B | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 🟢 : pretrained | 35 | 7.721 | other | false | 2025-02-08T08:12:15.753515 | true | ||
Qwen/Qwen2-0.5B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-17T14:43:05Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 0 | 0.49 | other | false | 2025-02-08T06:19:28.177264 | true | ||
Qwen/Qwen2-1.5B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 0 | 1.54 | other | false | 2025-02-08T08:02:15.368025 | true | ||
Qwen/Qwen2-72B | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🟢 : pretrained | 0 | 72.7 | other | false | 2025-01-21T07:41:00.318347 | true | ||
Qwen/Qwen2-7B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 0 | 7.62 | other | false | 2025-02-08T07:02:14.997658 | true | ||
Qwen/Qwen2.5-0.5B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-17T14:43:05Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 12 | 0.494 | apache-2.0 | false | 2025-02-08T06:21:47.357291 | true | ||
Qwen/Qwen2.5-0.5B | 060db6499f32faf8b98477b0a26969ef7d8b9987 | bfloat16 | Original | FINISHED | 2025-01-17T14:43:05Z | 🟢 : pretrained | 61 | 0.494 | apache-2.0 | false | 2025-02-08T06:19:48.502455 | true | ||
Qwen/Qwen2.5-1.5B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 16 | 1.544 | apache-2.0 | false | 2025-02-08T06:42:14.926675 | true | ||
Qwen/Qwen2.5-1.5B | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 🟢 : pretrained | 8 | 1.544 | apache-2.0 | false | 2025-02-08T07:12:15.230903 | true | ||
Qwen/Qwen2.5-14B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 13 | 14.77 | apache-2.0 | false | 2025-02-08T11:18:22.068157 | true | ||
Qwen/Qwen2.5-14B | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🟢 : pretrained | 5 | 14.77 | apache-2.0 | false | 2025-02-08T11:42:22.847597 | true | ||
Qwen/Qwen2.5-32B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 7 | 32.764 | apache-2.0 | false | 2025-02-08T09:02:20.345973 | true | ||
Qwen/Qwen2.5-32B | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🟢 : pretrained | 4 | 32.764 | apache-2.0 | false | 2025-01-19T07:29:37.106248 | true | ||
Qwen/Qwen2.5-3B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 18 | 3.086 | other | false | 2025-02-08T06:52:15.204714 | true | ||
Qwen/Qwen2.5-3B | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 🟢 : pretrained | 5 | 3.086 | other | false | 2025-02-08T07:22:15.401441 | true | ||
Qwen/Qwen2.5-72B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-31T12:15:49.899735Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 701 | 72.706 | other | false | 2025-02-08T19:51:01.141367 | true | ||
Qwen/Qwen2.5-72B | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🟢 : pretrained | 7 | 72.706 | other | false | 2025-01-20T07:14:54.034141 | true | ||
Qwen/Qwen2.5-7B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 20 | 7.616 | apache-2.0 | false | 2025-02-08T07:42:15.444769 | true | ||
Qwen/Qwen2.5-7B | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 🟢 : pretrained | 6 | 7.616 | apache-2.0 | false | 2025-02-08T07:32:15.370596 | true | ||
Sakalti/Saka-1.5B | main | float16 | Original | FINISHED | 2025-02-11T09:31:26.084185Z | 🤝 : base merges and merges | 1 | 1.777 | false | 2025-02-11T10:47:27.139547 | false | |||
Sakalti/Saka-14B | main | float16 | Original | FINISHED | 2025-02-11T09:31:54.948549Z | 🤝 : base merges and merges | 5 | 14.766 | false | 2025-02-12T05:21:41.344042 | false | |||
Sakalti/SakaMoe-3x14B-Instruct | main | float16 | Original | FINISHED | 2025-02-19T10:28:44.992462Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 0 | 45.343 | apache-2.0 | false | 2025-02-19T11:00:37.975586 | false | ||
Sakalti/Ultiima-72B | main | float16 | Original | FINISHED | 2025-02-11T09:32:32.001771Z | 🤝 : base merges and merges | 1 | 72.706 | other | false | 2025-02-11T11:47:30.706241 | false | ||
SeaLLMs/SeaLLM-7B-v2.5 | main | float16 | Original | FINISHED | 2025-01-22T13:33:37Z | 🔶 : fine-tuned on domain-specific datasets | 44 | 8.538 | other | false | 2025-02-06T08:25:34.187570 | true | ||
SeaLLMs/SeaLLMs-v3-7B-Chat | main | bfloat16 | Original | FINISHED | 2025-03-05T06:06:48.377744Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 52 | 7.616 | other | false | 2025-03-05T06:20:22.364493 | true | ||
Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 1 | 3.821 | apache-2.0 | false | 2025-02-08T11:58:21.193972 | true | ||
TarjamaN/Pronoia-14b-community | main | bfloat16 | Original | FINISHED | 2025-02-10T18:01:56.806782Z | 🔶 : fine-tuned on domain-specific datasets | 0 | 14 | apache-2.0 | false | 2025-02-11T09:44:43.352229 | false | ||
VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🔶 : fine-tuned on domain-specific datasets | 13 | 12.248 | apache-2.0 | false | 2025-02-08T11:50:21.673103 | true | ||
airev-ai/emirati-14b-v2 | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🔶 : fine-tuned on domain-specific datasets | 0 | 14.77 | apache-2.0 | false | 2025-02-09T07:32:28.240060 | true | ||
arcee-ai/Arcee-Spark | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 78 | 7.616 | apache-2.0 | false | 2025-02-08T12:22:22.278691 | true | ||
cognitivecomputations/Dolphin3.0-R1-Mistral-24B | main | bfloat16 | Original | FINISHED | 2025-02-11T22:06:35.591253Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 91 | 23.572 | false | 2025-02-11T22:18:43.247824 | true | |||
freewheelin/free-evo-qwen72b-v0.8-re | main | float16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🔶 : fine-tuned on domain-specific datasets | 3 | 72.288 | mit | false | 2025-02-06T06:20:41.352289 | false | ||
google/gemma-2-27b-it | main | bfloat16 | Original | FINISHED | 2025-02-01T05:04:02.122717Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 509 | 27.227 | gemma | false | 2025-02-01T05:04:43.884909 | true | ||
google/gemma-2-27b | main | float16 | Original | FINISHED | 2025-01-30T09:20:03.598382Z | 🟢 : pretrained | 193 | 27.227 | gemma | false | 2025-01-30T09:23:21.646352 | false | ||
google/gemma-2-2b-it | main | bfloat16 | Original | FINISHED | 2025-02-19T14:03:23.500812Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 961 | 2.614 | gemma | false | 2025-02-19T14:30:53.668405 | true | ||
google/gemma-2-9b-it | main | bfloat16 | Original | FINISHED | 2025-01-29T15:55:59.067426Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 639 | 9.242 | gemma | false | 2025-02-09T07:33:17.513544 | true | ||
huihui-ai/Qwen2.5-32B-Instruct-abliterated | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 🔶 : fine-tuned on domain-specific datasets | 3 | 32.764 | apache-2.0 | false | 2025-02-08T09:26:20.139673 | true | ||
huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2 | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 🔶 : fine-tuned on domain-specific datasets | 7 | 7.616 | apache-2.0 | false | 2025-02-08T08:28:59.535671 | true | ||
huihui-ai/Qwen2.5-7B-Instruct-abliterated | main | bfloat16 | Original | FINISHED | 2025-01-17T21:58:33Z | 🔶 : fine-tuned on domain-specific datasets | 0 | 7.616 | apache-2.0 | false | 2025-02-08T08:29:14.425261 | true | ||
inceptionai/jais-adapted-13b-chat | main | float16 | Original | FINISHED | 2025-01-22T12:22:56Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 3 | 13.344 | apache-2.0 | false | 2025-02-06T07:55:33.840204 | true | ||
inceptionai/jais-adapted-13b | main | float16 | Original | FINISHED | 2025-01-22T12:22:56Z | 🟢 : pretrained | 4 | 13.344 | apache-2.0 | false | 2025-02-06T08:20:33.375364 | false | ||
inceptionai/jais-adapted-70b-chat | main | float16 | Original | FINISHED | 2025-01-22T12:22:56Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 6 | 69.501 | apache-2.0 | false | 2025-02-06T08:10:39.815075 | true | ||
inceptionai/jais-adapted-70b | main | float16 | Original | FINISHED | 2025-01-22T12:22:56Z | 🟢 : pretrained | 8 | 69.501 | apache-2.0 | false | 2025-02-06T08:05:33.250540 | false | ||
inceptionai/jais-adapted-7b-chat | main | float16 | Original | FINISHED | 2025-01-22T12:22:56Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 3 | 7.001 | apache-2.0 | false | 2025-02-06T08:00:33.749859 | true | ||
inceptionai/jais-adapted-7b | main | float16 | Original | FINISHED | 2025-01-22T12:22:56Z | 🟢 : pretrained | 5 | 7.001 | apache-2.0 | false | 2025-02-06T08:15:34.315914 | false | ||
inceptionai/jais-family-13b-chat | main | float16 | Original | FINISHED | 2025-01-21T05:22:20Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 4 | 13 | apache-2.0 | false | 2025-02-06T07:35:32.981629 | true | ||
inceptionai/jais-family-13b | main | float16 | Original | FINISHED | 2025-01-21T05:22:20Z | 🟢 : pretrained | 4 | 13 | apache-2.0 | false | 2025-02-06T07:25:32.458725 | false | ||
inceptionai/jais-family-30b-16k-chat | main | float16 | Original | FINISHED | 2025-01-21T05:22:20Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 4 | 30 | apache-2.0 | false | 2025-02-06T11:53:58.086928 | true | ||
inceptionai/jais-family-30b-16k | main | float16 | Original | FINISHED | 2025-01-21T05:22:20Z | 🟢 : pretrained | 7 | 30 | apache-2.0 | false | 2025-02-06T11:54:24.081900 | false | ||
inceptionai/jais-family-30b-8k-chat | main | float16 | Original | FINISHED | 2025-01-21T05:22:20Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 4 | 30 | apache-2.0 | false | 2025-02-06T07:20:32.329369 | true | ||
inceptionai/jais-family-30b-8k | main | float16 | Original | FINISHED | 2025-01-21T05:22:20Z | 🟢 : pretrained | 6 | 30 | apache-2.0 | false | 2025-02-06T12:30:16.837744 | false | ||
maldv/Awqward2.5-32B-Instruct | main | bfloat16 | Original | FINISHED | 2025-02-16T13:58:37.505999Z | 🔶 : fine-tuned on domain-specific datasets | 2 | 32.764 | apache-2.0 | false | 2025-02-16T14:57:15.527270 | true | ||
maldv/Qwentile2.5-32B-Instruct | 892662a70cc13ffb88809f5b90db080a3d81ffad | bfloat16 | Original | FINISHED | 2025-02-11T01:04:27.495862Z | 🤝 : base merges and merges | 31 | 32.764 | apache-2.0 | false | 2025-02-11T09:47:22.846202 | true | ||
meta-llama/Llama-3.1-8B-Instruct | main | bfloat16 | Original | FINISHED | 2025-03-04T20:19:21.429375Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 3693 | 8.03 | llama3.1 | false | 2025-03-04T20:19:50.291565 | false | ||
meta-llama/Llama-3.1-8B | main | bfloat16 | Original | FINISHED | 2025-03-05T18:42:38.644316Z | 🟢 : pretrained | 1464 | 8.03 | llama3.1 | false | 2025-03-05T18:51:03.150383 | false | ||
meta-llama/Llama-3.3-70B-Instruct | main | bfloat16 | Original | FINISHED | 2025-01-29T16:03:20.622053Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 1791 | 70.554 | llama3.3 | false | 2025-02-08T19:35:00.995484 | true | ||
mobiuslabsgmbh/DeepSeek-R1-ReDistill-Llama3-8B-v1.1 | main | bfloat16 | Original | FINISHED | 2025-02-16T13:50:14.352880Z | 🔶 : fine-tuned on domain-specific datasets | 8 | 8.03 | mit | false | 2025-02-16T13:57:14.711888 | true | ||
mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1 | main | bfloat16 | Original | FINISHED | 2025-02-16T13:53:37.834456Z | 🔶 : fine-tuned on domain-specific datasets | 15 | 7.616 | mit | false | 2025-02-16T14:27:14.153900 | true | ||
recoilme/recoilme-gemma-2-9B-v0.2 | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 🔶 : fine-tuned on domain-specific datasets | 0 | 10.159 | cc-by-nc-4.0 | false | 2025-02-08T09:42:21.044400 | true | ||
recoilme/recoilme-gemma-2-9B-v0.4 | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 🔶 : fine-tuned on domain-specific datasets | 0 | 10.159 | cc-by-nc-4.0 | false | 2025-02-08T09:34:21.199935 | true | ||
rombodawg/Rombos-LLM-V2.5-Qwen-72b | main | bfloat16 | Original | FINISHED | 2025-02-12T11:34:52.981127Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 35 | 72.706 | other | false | 2025-02-14T05:41:55.873866 | false | ||
rombodawg/Rombos-LLM-V2.6-Qwen-14b | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🔶 : fine-tuned on domain-specific datasets | 17 | 14.77 | apache-2.0 | false | 2025-02-08T12:30:21.991946 | true | ||
silma-ai/SILMA-9B-Instruct-v1.0 | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 0 | 9.242 | gemma | false | 2025-02-08T09:50:21.448289 | true | ||
silma-ai/SILMA-Kashif-2B-Instruct-v1.0 | main | bfloat16 | Original | FINISHED | 2025-02-12T13:30:18.234483Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 13 | 2.614 | gemma | false | 2025-02-12T13:50:19.980202 | true | ||
speakleash/Bielik-11B-v2 | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🟩 : continuously pretrained | 34 | 11.169 | apache-2.0 | false | 2025-02-09T07:32:43.195383 | false | ||
tanliboy/lambda-qwen2.5-14b-dpo-test | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 2 | 14.77 | apache-2.0 | false | 2025-02-08T12:54:21.935778 | true | ||
tanliboy/lambda-qwen2.5-32b-dpo-test | main | bfloat16 | Original | FINISHED | 2025-01-19T07:28:17Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 1 | 32.764 | apache-2.0 | false | 2025-02-06T06:40:32.398695 | true | ||
tiiuae/Falcon3-10B-Base | main | bfloat16 | Original | FINISHED | 2025-02-12T11:08:16.453425Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 33 | 10.306 | other | false | 2025-02-12T17:45:11.108494 | false | ||
tiiuae/Falcon3-7B-Base | main | bfloat16 | Original | FINISHED | 2025-02-12T11:06:18.513886Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 24 | 7.456 | other | false | 2025-02-12T11:20:01.605414 | false | ||
upstage/SOLAR-10.7B-v1.0 | main | float16 | Original | FINISHED | 2025-01-19T07:28:17Z | 🟢 : pretrained | 233 | 10.732 | apache-2.0 | false | 2025-02-08T13:02:21.896392 | false | ||
v000000/Qwen2.5-14B-Gutenberg-1e-Delta | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 2 | 14.77 | apache-2.0 | false | 2025-02-08T09:58:20.517925 | true | ||
v000000/Qwen2.5-Lumen-14B | main | bfloat16 | Original | FINISHED | 2025-01-18T11:14:25Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 2 | 14.77 | apache-2.0 | false | 2025-02-08T10:06:20.537013 | true | ||
yellowtown/7B-v0.2 | 3b4cfeea5eaad38d37dd6016fff949fc271189ee | bfloat16 | Original | FINISHED | 2025-02-14T08:56:37.440691Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 0 | 7.616 | apache-2.0 | false | 2025-02-14T09:24:48.249354 | false | ||
yellowtown/7B-v0.2 | be5fac4dda0fbf83403ef35d5009266f88669491 | bfloat16 | Original | FINISHED | 2025-01-23T12:14:45Z | 🔶 : fine-tuned on domain-specific datasets | 0 | 7.616 | apache-2.0 | false | 2025-02-06T08:30:33.581302 | true |
model | base_model | revision | precision | weight_type | status | submitted_time | model_type | likes | params | license | private | job_id | job_start_time | chat_template |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
microsoft/Phi-4-multimodal-instruct | main | bfloat16 | Original | FAILED | 2025-02-19T10:29:30.198367Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 242 | 107.814 | other | false | 2025-02-19T11:30:40.535277 | false |
model | base_model | revision | precision | weight_type | status | submitted_time | model_type | likes | params | license | private | job_id | job_start_time | chat_template |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sakalti/ultiima-108B | main | float16 | Original | FAILED | 2025-02-19T10:29:30.198367Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 0 | 107.814 | other | false | 2025-02-19T11:30:40.535277 | false | ||
Sakalti/ultiima-125B | main | float16 | Original | FAILED | 2025-02-19T10:28:09.333042Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 0 | 125.367 | other | false | 2025-02-19T10:30:36.208911 | false | ||
microsoft/Phi-4-multimodal-instruct | main | bfloat16 | Original | FAILED | 2025-02-27T11:20:15.578684Z | 💬 : chat models (RLHF, DPO, IFT, ...) | 242 | 5.574 | mit | false | 2025-02-27T11:40:16.785764 | true |
About
While outstanding LLM models are being released competitively, most of them are centered on English and are familiar with the English cultural sphere. We operate the Open Arabic LLM Leaderboard (OALL), to evaluate models that reflect the characteristics of the Arabic language, culture and heritage. Through this, we hope that users can conveniently use the leaderboard, participate, and contribute to the advancement of research in the Arab region 🔥.
Icons & Model types
🟢 : pretrained
🟩 : continuously pretrained
💬 : chat models (RLHF, DPO, IFT, ...)
🔶 : fine-tuned on domain-specific datasets
🤝 : base merges and moerges
Notes:
- We reserve the right to correct any incorrect tags or icons after manual verification to ensure the accuracy and reliability of the leaderboard. This helps maintain the integrity and trustworthiness of the platform.
- Some models may be flagged as “Subjects of Caution” by the community. These models might have used the evaluation set for training, attempted to manipulate rankings, or raised ethical concerns. Models deemed as such may face restricted visibility or removal from the leaderboard. Users are advised to exercise discretion when interpreting rankings.
- The leaderboard automatically hides models that were submitted, evaluated, and subsequently made private or gated post-evaluation. This platform is designed for “open” models that benefit the wider community. If you intend to restrict your model’s accessibility after using the leaderboard’s resources or exploit the platform solely for personal gains, please refrain from submitting. Violators may face bans on their usernames and/or organization IDs from future submissions.
- The leaderboard no longer accepts models in float32 precision except under special circumstances. If you are the developer of a float32 model and believe it deserves inclusion, please reach out to us.
- To ensure fair and equitable access to leaderboard resources, all usernames and organization IDs are limited to 5 submissions per week. This policy minimizes spamming, encourages thoughtful participation, and allows everyone in the community to benefit from the platform.
By adhering to these guidelines, we aim to foster a fair, collaborative, and transparent environment for evaluating and advancing open models for the arabic/arabic-interested communities.
How it works
📈 We evaluate models using LightEval, a unified and straightforward framework from the HuggingFace Eval Team to test and assess causal language models on a large number of different evaluation tasks.
To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings 0-shots
. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
Also, given the nature of the tasks, which include multiple-choice questions, the leaderboard primarily uses normalized log likelihood accuracy loglikelihood_acc_norm
for all tasks.
Please, consider reaching out to us through the discussions tab if you are working on benchmarks for Arabic LLMs and willing to see them on this leaderboard as well. Your benchmark might change the whole game for Arabic models !
Details and Logs
- Detailed numerical results in the
results
OALL dataset: https://huggingface.co/datasets/OALL/v2_results - Community queries and running status in the
requests
OALL dataset: https://huggingface.co/datasets/OALL/requests_v2
More resources
For evaluations of chat models using 3C3H on generative tasks benchmarks, please refer to the AraGen-Leaderboard.
If you still have questions, you can check our FAQ here!
