Minimum Model Size (params)

Maximum Model Size (params)

0 72

Rank	T	Model Name	Average ⬆️	AlGhafa	ArabicMMLU	EXAMS	MadinahQA	AraTrust	ALRAGE	ArbMMLU-HT
100	💬	deep-analysis-research/D2IL-Arabic-Qwen2.5-72B-Instruct-v0.1	75.86	78.72	75.32	58.85	76.82	89.68	77.65	73.96

Rank	T	Model Name	Average ⬆️	AlGhafa	ArabicMMLU	EXAMS	MadinahQA	AraTrust	ALRAGE	ArbMMLU-HT
1	💬	deep-analysis-research/D2IL-Arabic-Qwen2.5-72B-Instruct-v0.1	75.86	78.72	75.32	58.85	76.82	89.68	77.65	73.96
2	💬	deep-analysis-research/D2IL-Arabic-Qwen2.5-72B-Instruct-v0.2	75.85	78.02	75.22	58.85	77.01	89.45	78.38	74.05
3	💬	rombodawg/Rombos-LLM-V2.5-Qwen-72b	75.59	78.21	75.05	59.22	75.34	89.83	77.62	73.87
4	🤝	Sakalti/Ultiima-72B	75.51	78.17	75.04	59.22	75.15	89.65	77.43	73.9
5	💬	AbdulmalekDS/qwen72b-ar-lora	75.49	78.61	74.27	59.22	75.89	91.4	74.78	74.29
6	💬	meta-llama/Llama-3.3-70B-Instruct	74.47	80.36	69.76	66.67	72.91	88.05	75.83	67.68
7	💬	Applied-Innovation-Center/AIC-1	73.89	79	69.94	56.61	78	89.83	75.98	67.86
8	💬	MaziyarPanahi/calme-2.1-qwen2.5-72b	73.1	78.1	70.85	59.78	60.57	89.06	80.24	73.09
9	💬	MaziyarPanahi/calme-2.2-qwen2.5-72b	72.72	78.14	70.6	59.78	58.43	89.22	79.93	72.96
10	💬	Qwen/Qwen2.5-72B-Instruct	72.39	78.22	71.43	58.66	58.11	89.21	78.85	72.23
11	🤝	maldv/Qwentile2.5-32B-Instruct	72.02	78	68.24	50.84	73.6	88.07	78.72	66.67
12	🤝	Sakalti/Saka-14B	71.87	76.26	70.84	55.68	70.01	88.29	77.48	64.56
13	💬	Sakalti/SakaMoe-3x14B-Instruct	71.85	75.94	70.66	54.56	71.62	88.8	77.26	64.14
14	💬	Cran-May/tempmotacilla-cinerea-0308	71.61	75.48	68.07	54.19	71.27	89.98	79.8	62.45
15	💬	google/gemma-3-27b-it	71.4	77.9	65.56	57.54	69.43	89.08	73.35	66.92
16	💬	FreedomIntelligence/AceGPT-v2-32B-Chat	70.88	77.33	68.92	53.26	73.77	84.72	76.89	61.24
17	💬	tanliboy/lambda-qwen2.5-32b-dpo-test	70.85	78.17	69.92	56.05	59.83	86.53	78.66	66.8
18	🔶	freewheelin/free-evo-qwen72b-v0.8-re	70.67	76.28	67.77	53.82	74.29	86.78	73.27	62.47
19	🔶	maldv/Awqward2.5-32B-Instruct	70.41	78.01	68.08	55.87	56.96	88.27	79.27	66.38
20	💬	QCRI/Fanar-1-9B-Instruct	70.32	76.44	65.83	52.7	73.35	88.29	77.01	58.59
21	🔶	zetasepic/Qwen2.5-32B-Instruct-abliterated-v2	70.26	77.98	69.58	55.87	55.32	88.4	78.96	65.73
22	🟢	Qwen/Qwen2-72B	70.14	78.56	66.54	53.82	61.37	89.22	73.4	68.06
23	💬	TIGER-Lab/Qwen2.5-32B-Instruct-CFT	70.13	77.95	67.96	55.68	56.58	87.75	79.28	65.73
24	💬	FreedomIntelligence/AceGPT-v2-70B-Chat	70.07	76.59	65.14	55.31	65.64	86.69	72.82	68.31
25	💬	Qwen/Qwen2.5-32B-Instruct	69.99	78.11	68.03	55.12	57.1	86.66	79.15	65.74
26	🔶	huihui-ai/Qwen2.5-32B-Instruct-abliterated	69.51	78.05	67.09	53.26	56.47	87.04	79.05	65.62
27	🟢	Qwen/Qwen2.5-72B	69.37	77.16	64.65	46.37	65.02	88.55	77.37	66.44
28	🟢	Qwen/Qwen2.5-14B	68.53	77.15	68.49	53.26	59.9	86.52	72.71	61.69
29	💬	tanliboy/lambda-qwen2.5-14b-dpo-test	68.26	74.78	67.22	55.49	54.03	88.83	78.54	58.93
30	🔶	AXCXEPT/EZO-Qwen2.5-32B-Instruct	67.97	75	66.98	54.38	53.71	89.15	75.86	60.68
31	🟩	FreedomIntelligence/AceGPT-v2-70B	67.2	58.73	68.02	56.24	68.35	84.35	68.94	65.74
32	💬	CohereForAI/aya-expanse-32b	67.17	77.61	60.63	51.02	53.45	89	79.64	58.86
33	💬	CohereForAI/c4ai-command-r7b-arabic-02-2025	67.07	74.84	59.34	64.99	63.84	80.47	75.9	50.14
34	🔶	recoilme/recoilme-gemma-2-9B-v0.4	66.43	74.37	53.99	47.86	67.43	84.54	80.37	56.43
35	🟢	Qwen/Qwen3-8B-Base	66.22	74.85	65.05	52.51	52.16	83.36	74.09	61.49
36	💬	CohereForAI/aya-23-35B	66.09	75.89	62.5	49.91	62.49	84.21	73.67	53.98
37	💬	JasperV13/Yehia-7B-DPO-Reasoning-preview	65.69	75.1	66.35	51.77	54.87	81.92	74.67	55.15
38	💬	Navid-AI/Yehia-7B-preview	65.68	70.81	64.9	52.14	54.37	87.49	76.64	53.4
39	💬	JasperV13/Yehia-7B-Reasoning-preview	65.49	75.2	66.32	52.7	54.95	80.75	73.25	55.25
40	🟢	Qwen/Qwen2.5-32B	65.45	77.05	59.82	49.53	55.82	83.01	76.68	56.24
41	💬	inceptionai/jais-family-30b-16k-chat	65.43	71.22	61.22	50.09	66.26	81.57	74.95	52.73
42	💬	inceptionai/jais-adapted-70b-chat	65.28	75.16	63.77	52.51	53.56	83.74	72.34	55.9
43	💬	ALLaM-AI/ALLaM-7B-Instruct-preview	65.25	69.49	64.9	51.58	54.24	86.93	76.81	52.81
44	💬	inceptionai/jais-family-30b-8k-chat	64.87	71.34	60.69	50.47	63.92	83.14	72.64	51.91
45	💬	v000000/Qwen2.5-Lumen-14B	64.11	72.66	61.92	53.82	38.08	87.03	78.85	56.39
46	🔶	rombodawg/Rombos-LLM-V2.6-Qwen-14b	63.85	74.12	60.11	54	36.09	86.77	78.92	56.96
47	💬	Qwen/Qwen2-7B-Instruct	63.61	73.24	60.01	47.3	59.5	82.77	71.13	51.3
48	💬	v000000/Qwen2.5-14B-Gutenberg-1e-Delta	63.55	72.51	60.85	53.45	36.39	86.86	78.68	56.08
49	🟢	google/gemma-3-27b-pt	63.2	76.09	70.76	57.36	59.51	87.83	24.24	66.62
50	💬	Qwen/Qwen2.5-14B-Instruct	63.18	72.25	60.02	53.63	35.58	86.11	78.91	55.73
51	💬	Cran-May/T.E-8.1	62.95	73.23	52.8	40.97	62.34	82.09	75.77	53.42
52	🟢	Qwen/Qwen3-4B-Base	62.86	68.45	59.83	48.23	54.01	84.86	70.92	53.72
53	💬	QCRI/Fanar-1-9B	62.83	70.21	63.71	51.77	60.14	82.1	52.32	59.55
54	🔶	amine-khelif/BARS	62.81	72.81	52.06	44.51	63.21	84	76.69	46.38
55	🔶	huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2	62.55	70.69	59.49	45.25	63.54	74.21	77.74	46.91
56	💬	JasperV13/Yehia-7B-SFT-Reasoning-preview	62.38	72.75	64.24	51.58	51.29	76.6	63.21	56.98
57	💬	FreedomIntelligence/AceGPT-v2-8B-Chat	62.35	73.48	61.32	49.72	55.89	74.19	70.94	50.89
58	💬	arcee-ai/Arcee-Spark	62.34	72.9	59.16	48.23	51.39	82.61	70.74	51.33
59	🔶	SeaLLMs/SeaLLM-7B-v2.5	61.85	71.37	57.98	48.23	57.99	80.75	67.92	48.69
60	💬	Orion-zhen/Qwen2.5-7B-Instruct-Uncensored	61.82	72.04	49.92	37.06	64.23	80.43	76.63	52.46
61	🟢	FreedomIntelligence/AceGPT-v2-32B	61.74	54.93	63.15	48.6	59.71	83.96	68.96	52.87
62	🟢	Qwen/Qwen2.5-7B	61.13	72.17	61.42	49.16	51.13	77.56	64.83	51.67
63	🟩	AIDC-AI/Marco-LLM-AR-V4	61.12	75.03	60.53	53.45	45.95	74.37	62.1	56.42
64	💬	CohereLabs/aya-expanse-8b	60.85	66.71	57.55	45.44	48.74	82.54	75.78	49.22
65	💬	Qwen/Qwen2.5-7B-Instruct	59.8	65.6	52.25	39.66	62.73	80.68	77.37	40.33
66	🔶	Isaak-Carter/Josiefied-Qwen2.5-7B-Instruct-abliterated-v2	59.7	70.16	55.27	44.88	57.33	72.24	75.75	42.27
67	🔶	airev-ai/emirati-14b-v2	58.7	74.64	49.38	41.71	30.91	81.77	77.23	55.27
68	🔶	recoilme/recoilme-gemma-2-9B-v0.2	58.68	72.55	42.4	33.33	51.37	86.53	80.51	44.04
69	🟩	AIDC-AI/Marco-LLM-AR-V2	58.61	72.41	61.86	50.09	46.05	73.52	51.13	55.18
70	🟢	google/gemma-3-12b-pt	58.2	60.68	65.8	54.38	52.96	87.01	25.31	61.24
71	💬	SeaLLMs/SeaLLMs-v3-7B-Chat	58.14	71.11	50.97	44.32	50.54	78.05	63.8	48.16
72	💬	inceptionai/jais-adapted-13b-chat	58.08	67.28	54.23	47.3	44.2	79.68	68.41	45.45
73	🔶	Slim205/Barka-9b-it	57.72	60.68	61.93	52.33	65.52	86.24	20.8	56.57
74	💬	silma-ai/SILMA-9B-Instruct-v1.0	57.65	33.99	62.16	51.4	52.48	82.83	80.39	40.32
75	🔶	VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct	57.58	73.79	46.66	42.64	56.89	64.26	70.26	48.59
76	🔶	huihui-ai/Qwen2.5-7B-Instruct-abliterated	57.01	63.56	43.28	34.45	63.01	81.29	76.66	36.83
77	🟢	Qwen/Qwen1.5-32B	56.2	66.43	48.81	34.82	52.99	78.11	66.97	45.3
78	🤝	Sakalti/oxyge1-33B	55.99	70.31	47.16	51.21	39.26	76.5	79.8	27.66
79	💬	microsoft/Phi-4-mini-instruct	55.81	69.27	49.49	35.75	47.09	77.76	69.09	42.21
80	💬	google/gemma-2-27b-it	55.51	72.22	37.07	24.39	33.25	80.48	79.9	61.23
81	💬	meta-llama/Llama-3.1-8B-Instruct	55.41	70.91	53.58	50.28	39.72	75.57	49.89	47.94
82	💬	Qwen/Qwen2.5-3B-Instruct	54.22	61.1	48.21	43.02	47	72.53	71.73	35.93
83	🟢	Qwen/Qwen3-14B-Base	53.76	57.1	44.62	43.2	52.81	57.55	73.17	47.84
84	🟢	inceptionai/jais-family-30b-8k	53.63	54.34	51.53	45.81	29.82	70.66	80.47	42.8
85	🟢	Qwen/Qwen3-1.7B-Base	52.91	56.77	49.71	38.18	40.03	75.56	66.26	43.87
86	💬	01-ZeroOne/SUHAIL-14B-preview	52.67	54.54	50.5	39.11	44.97	61.04	71.49	47.05
87	💬	FreedomIntelligence/AceGPT-13B-chat	52.55	59.18	49.84	40.97	33.08	65.7	79.75	39.31
88	🟢	Qwen/Qwen1.5-14B	52.5	65.72	49.74	35.38	51.38	64.35	62.91	38.03
89	💬	silma-ai/SILMA-Kashif-2B-Instruct-v1.0	52.42	59.73	45.61	33.15	38.84	73.35	80.48	35.79
90	🟢	inceptionai/jais-family-30b-16k	52.4	57.63	50.32	46.93	26.31	62.02	80.62	42.96
91	💬	inceptionai/jais-family-13b-chat	52.13	66.47	53.31	41.53	53.01	75.03	28.53	47.03
92	🟢	inceptionai/jais-adapted-70b	51.94	54.82	51.97	43.58	34.96	74.97	55.28	48
93	🟢	meta-llama/Llama-3.1-8B	51.64	64.34	52.28	40.04	43.08	71.98	47.08	42.67
94	💬	inceptionai/jais-adapted-7b-chat	51.05	63.38	49.9	41.71	34.79	66.02	63.6	37.97
95	🟢	google/gemma-3-4b-pt	50.47	54.16	55.25	42.83	44.09	78.31	33.41	45.24
96	🔶	Hagrass/LLama3-3.2-instruct-trained	50.45	56.75	48.19	42.09	39	69.78	61.81	35.53
97	🟢	upstage/SOLAR-10.7B-v1.0	49.48	59.36	38.86	31.47	37.94	65.51	76.14	37.07
98	💬	Qwen/Qwen2-1.5B-Instruct	49.16	53.11	49.19	35.2	45.49	68.93	54.79	37.38
99	💬	FreedomIntelligence/AceGPT-7B-chat	49.07	49.51	47.29	35.38	38.15	63.61	75	34.54
100	🔶	yellowtown/7B-v0.2	47.34	48.55	46.34	35.75	36.63		77	39.8
101	🟢	Qwen/Qwen1.5-7B	47.22	52.28	43.93	32.59	40.16	67.21	59.94	34.44
102	🟩	FreedomIntelligence/AceGPT-13B	47.21	48.23	41.38	36.87	35.37	56.51	79.96	32.12
103	💬	Qwen/Qwen2.5-1.5B-Instruct	47.17	48.41	43.51	31.84	38.2	70.76	61.57	35.91

Submit Your Model for Evaluation 🌴

The Open Arabic LLM Leaderboard aims to help you evaluate and compare the performance of Arabic Large Language Models.

When you submit a model on this page, it is automatically evaluated on a set of arabic native benchmarks (find here) with one additional human-translated version of MMLU.

The GPU used for evaluation is operated with the support of Technology Innovation Institute (TII).

More details about the benchmarks and the evaluation process is provided on the “About” section below.

Find the first version of the leaderboard hosted as Legacy in this Space.

Model Name

Revision Commit

Model Type

Weight Type

Precision

Base Model (if adapter or delta weights)

Evaluate using chat-template?

Yes No

Evaluation Status

model	base_model	revision	precision	weight_type	status	submitted_time	model_type	likes	params	license	private	job_id	job_start_time	chat_template
riotu-lab/ArabianGPT-01B		main	bfloat16	Original	PENDING	2025-10-15T10:13:45.412010Z	💬 : chat models (RLHF, DPO, IFT, ...)	711	0.752	apache-2.0	false			true

model	base_model	revision	precision	weight_type	status	submitted_time	model_type	likes	params	license	private	job_id	job_start_time	chat_template
Qwen/Qwen3-0.6B		main	bfloat16	Original	PENDING	2025-10-15T10:13:45.412010Z	💬 : chat models (RLHF, DPO, IFT, ...)	711	0.752	apache-2.0	false			true
riotu-lab/ArabianGPT-01B		main	bfloat16	Original	PENDING	2025-10-15T18:43:01.654520Z	💬 : chat models (RLHF, DPO, IFT, ...)	13	0	apache-2.0	false			true

model	base_model	revision	precision	weight_type	status	submitted_time	model_type	likes	params	license	private	job_id	job_start_time	chat_template
CohereLabs/aya-expanse-8b		main	float16	Original	RUNNING	2025-04-01T13:34:00.297172Z	💬 : chat models (RLHF, DPO, IFT, ...)	381	8	cc-by-nc-4.0	false			true

model	base_model	revision	precision	weight_type	status	submitted_time	model_type	likes	params	license	private	job_id	job_start_time	chat_template
CohereLabs/aya-expanse-8b		main	float16	Original	RUNNING	2025-04-01T13:34:00.297172Z	💬 : chat models (RLHF, DPO, IFT, ...)	381	8	cc-by-nc-4.0	false			true

model	base_model	revision	precision	weight_type	status	submitted_time	model_type	likes	params	license	private	job_id	job_start_time	chat_template
deep-analysis-research/D2IL-Arabic-Qwen2.5-72B-Instruct-v0.1	Navid-AI/Yehia-7B-preview	060db6499f32faf8b98477b0a26969ef7d8b9987	bfloat16	Original	FINISHED	2025-06-025T13:34:00.297172Z	🔶 : fine-tuned on domain-specific datasets	1554	14.768	bigscience-bloom-rail-1.0	false		2025-09-21T20:40:09.8765656	false

model	base_model	revision	precision	weight_type	status	submitted_time	model_type	likes	params	license	private	job_start_time	chat_template
01-ZeroOne/SUHAIL-14B-preview		main	bfloat16	Original	FINISHED	2025-07-06T20:30:33.118428Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	14.768	cc-by-nc-4.0	false		true
01-ai/Yi-1.5-34B-32K		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	🟩 : continuously pretrained	13	34.389	apache-2.0	false	2025-01-21T07:40:09.877337	false
01-ai/Yi-1.5-9B-32K		main	bfloat16	Original	FINISHED	2025-01-18T11:14:25Z	🟩 : continuously pretrained	7	8.829	apache-2.0	false	2025-02-08T11:10:21.161772	false
01-ai/Yi-1.5-9B		main	bfloat16	Original	FINISHED	2025-01-18T11:14:25Z	🟩 : continuously pretrained	24	8.829	apache-2.0	false	2025-02-08T08:29:36.805940	false
AIDC-AI/Marco-LLM-AR-V2		main	bfloat16	Original	FINISHED	2025-03-04T09:59:59.907551Z	🟩 : continuously pretrained	1	7.616	apache-2.0	false	2025-03-10T06:57:10.593749	false
AIDC-AI/Marco-LLM-AR-V3		main	bfloat16	Original	FINISHED	2025-03-04T19:15:00.114106Z	🟩 : continuously pretrained	0	7.616	apache-2.0	false	2025-03-04T19:19:53.973663	true
AIDC-AI/Marco-LLM-AR-V4		main	bfloat16	Original	FINISHED	2025-03-08T16:51:41.021939Z	🟩 : continuously pretrained	0	7.616	apache-2.0	false	2025-03-08T16:54:57.375237	false
AIDC-AI/Marco-LLM-AR		main	bfloat16	Original	FINISHED	2025-02-11T06:35:54.938982Z	🟩 : continuously pretrained	0	7.616	apache-2.0	false	2025-02-12T05:20:03.501821	true
ALLaM-AI/ALLaM-7B-Instruct-preview		main	bfloat16	Original	FINISHED	2025-02-18T18:01:56.806782Z	💬 : chat models (RLHF, DPO, IFT, ...)	5	7	apache-2.0	false	2025-02-19T09:45:10.993265	true
AXCXEPT/EZO-Qwen2.5-32B-Instruct		main	bfloat16	Original	FINISHED	2025-01-18T11:14:25Z	🔶 : fine-tuned on domain-specific datasets	2	32.764	apache-2.0	false	2025-02-08T08:30:19.518006	true
AbdulmalekDS/qwen72b-ar-lora		main	bfloat16	Original	FINISHED	2025-09-21T18:57:24.790627Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	72.706	cc-by-nc-nd-4.0	false	2025-09-21T20:40:09.8765656	true
Applied-Innovation-Center/AIC-1		main	bfloat16	Original	FINISHED	2025-09-10T19:18:21.928629Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	32.764	apache-2.0	false		true
CohereForAI/aya-23-35B		main	float16	Original	FINISHED	2025-01-29T15:57:52.890641Z	💬 : chat models (RLHF, DPO, IFT, ...)	268	34.981	cc-by-nc-4.0	false	2025-02-08T19:27:01.105080	true
CohereForAI/aya-expanse-32b		main	float16	Original	FINISHED	2025-01-18T11:14:25Z	💬 : chat models (RLHF, DPO, IFT, ...)	202	32.3	CC-BY-NC-4.0	false	2025-02-08T08:38:20.150902	true
CohereForAI/c4ai-command-r7b-arabic-02-2025		main	bfloat16	Original	FINISHED	2025-03-12T15:11:02.805321Z	💬 : chat models (RLHF, DPO, IFT, ...)	85	8.028	cc-by-nc-4.0	false	2025-03-14T13:09:48.242234	true
CohereLabs/aya-expanse-8b		main	float16	Original	FINISHED	2025-06-025T13:34:00.297172Z	💬 : chat models (RLHF, DPO, IFT, ...)	381	8.028	cc-by-nc-4.0	false		true
Cran-May/T.E-8.1		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	💬 : chat models (RLHF, DPO, IFT, ...)	2	7.616	cc-by-nc-sa-4.0	false	2025-02-08T10:30:21.401321	true
Cran-May/tempmotacilla-cinerea-0308		main	bfloat16	Original	FINISHED	2025-03-09T03:57:30.005015Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	14.766		false	2025-03-09T04:25:35.983802	true
Daemontatox/Cogito-R1		main	bfloat16	Original	FINISHED	2025-02-11T20:46:56.174519Z	💬 : chat models (RLHF, DPO, IFT, ...)	1	32.764	apache-2.0	false	2025-02-11T20:48:32.217184	true
FreedomIntelligence/AceGPT-13B-chat		main	float16	Original	FINISHED	2025-01-20T17:11:55Z	💬 : chat models (RLHF, DPO, IFT, ...)	25	13	apache-2.0	false	2025-02-08T13:18:22.602821	false
FreedomIntelligence/AceGPT-13B		main	float16	Original	FINISHED	2025-01-19T07:28:17Z	🟩 : continuously pretrained	8	13	apache-2.0	false	2025-02-08T10:22:21.514187	false
FreedomIntelligence/AceGPT-7B-chat		main	float16	Original	FINISHED	2025-01-20T17:11:55Z	💬 : chat models (RLHF, DPO, IFT, ...)	8	7	apache-2.0	false	2025-02-08T13:26:22.687068	false
FreedomIntelligence/AceGPT-7B		main	float16	Original	FINISHED	2025-01-20T17:11:55Z	🟩 : continuously pretrained	3	7	apache-2.0	false	2025-02-08T13:10:22.229891	false
FreedomIntelligence/AceGPT-v2-32B-Chat		main	float16	Original	FINISHED	2025-01-18T11:14:25Z	💬 : chat models (RLHF, DPO, IFT, ...)	3	32.513	apache-2.0	false	2025-02-08T08:46:19.920445	true
FreedomIntelligence/AceGPT-v2-32B		main	float16	Original	FINISHED	2025-01-30T06:27:43.852438Z	🟢 : pretrained	1	32.513	apache-2.0	false	2025-02-08T19:43:00.764839	true
FreedomIntelligence/AceGPT-v2-70B-Chat		main	float16	Original	FINISHED	2025-01-22T12:22:56Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	70.554	apache-2.0	false	2025-02-06T07:50:32.889557	false
FreedomIntelligence/AceGPT-v2-70B		main	float16	Original	FINISHED	2025-01-22T12:22:56Z	🟩 : continuously pretrained	1	70.554	apache-2.0	false	2025-02-06T07:45:35.631469	false
FreedomIntelligence/AceGPT-v2-8B-Chat		main	float16	Original	FINISHED	2025-01-20T17:11:55Z	💬 : chat models (RLHF, DPO, IFT, ...)	2	8.03	apache-2.0	false	2025-02-08T19:19:00.154919	false
Hagrass/LLama3-3.2-instruct-trained		main	bfloat16	Original	FINISHED	2025-08-20T17:55:38.289200Z	🔶 : fine-tuned on domain-specific datasets	0	3.607	llama3.2	false		true
INSAIT-Institute/BgGPT-7B-Instruct-v0.2		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	🟩 : continuously pretrained	21	7.291	apache-2.0	false	2025-02-08T06:22:15.130408	true
Isaak-Carter/Josiefied-Qwen2.5-7B-Instruct-abliterated-v2		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	🔶 : fine-tuned on domain-specific datasets	1	7.616	apache-2.0	false	2025-02-08T06:32:15.877524	true
JasperV13/Yehia-7B-DPO-Reasoning-preview	Navid-AI/Yehia-7B-preview	main	bfloat16	Original	FINISHED	2025-05-28T22:53:05.993970Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	7.001	apache-2.0	false	2025-05-28T23:02:07.076336	false
JasperV13/Yehia-7B-Reasoning-preview	Navid-AI/Yehia-7B-preview	main	bfloat16	Original	FINISHED	2025-05-14T12:32:41.285911Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	7.001	apache-2.0	false	2025-05-14T12:34:35.187715	false
JasperV13/Yehia-7B-SFT-Reasoning-preview	Navid-AI/Yehia-7B-preview	main	bfloat16	Original	FINISHED	2025-05-31T00:15:41.636721Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	7.001	apache-2.0	false	2025-05-31T00:39:14.248994	false
MaziyarPanahi/calme-2.1-qwen2.5-72b		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	💬 : chat models (RLHF, DPO, IFT, ...)	1	72.7	other	false	2025-02-06T06:15:31.290181	true
MaziyarPanahi/calme-2.2-qwen2.5-72b		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	💬 : chat models (RLHF, DPO, IFT, ...)	1	72.7	other	false	2025-01-21T07:40:39.881897	true
Naseej/noon-7b		main	bfloat16	Original	FINISHED	2025-03-19T09:14:23.420114Z	🔶 : fine-tuned on domain-specific datasets	43	0	bigscience-bloom-rail-1.0	false	2025-03-23T23:32:34.612486	false
Navid-AI/Yehia-7B-preview		main	bfloat16	Original	FINISHED	2025-03-03T11:06:48.245215Z	💬 : chat models (RLHF, DPO, IFT, ...)	4	7.001		false	2025-03-03T15:20:20.030168	true
Orion-zhen/Qwen2.5-7B-Instruct-Uncensored		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	💬 : chat models (RLHF, DPO, IFT, ...)	3	7.616	gpl-3.0	false	2025-02-08T08:22:16.007188	true
QCRI/Fanar-1-9B-Instruct		main	bfloat16	Original	FINISHED	2025-06-07T01:23:47.177137Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	8.784	apache-2.0	false		true
QCRI/Fanar-1-9B		main	bfloat16	Original	FINISHED	2025-06-07T01:26:02.769995Z	💬 : chat models (RLHF, DPO, IFT, ...)	3	9.244	apache-2.0	false		true
Qwen/QwQ-32B		main	bfloat16	Original	FINISHED	2025-03-08T17:12:15.505686Z	💬 : chat models (RLHF, DPO, IFT, ...)	1554	32.764	apache-2.0	false	2025-03-08T17:25:01.961067	true
Qwen/Qwen1.5-1.8B		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	🟢 : pretrained	35	1.837	other	false	2025-02-08T10:54:21.072378	true
Qwen/Qwen1.5-14B		main	bfloat16	Original	FINISHED	2025-01-18T11:14:25Z	🟢 : pretrained	32	14.167	other	false	2025-02-08T09:10:20.446684	true
Qwen/Qwen1.5-32B		main	bfloat16	Original	FINISHED	2025-01-18T11:14:25Z	🟢 : pretrained	71	32.512	other	false	2025-02-08T08:54:20.154034	true
Qwen/Qwen1.5-4B		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	🟢 : pretrained	29	3.95	other	false	2025-02-08T07:52:15.475167	true
Qwen/Qwen1.5-7B		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	🟢 : pretrained	35	7.721	other	false	2025-02-08T08:12:15.753515	true
Qwen/Qwen2-0.5B-Instruct		main	bfloat16	Original	FINISHED	2025-01-17T14:43:05Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	0.49	other	false	2025-02-08T06:19:28.177264	true
Qwen/Qwen2-1.5B-Instruct		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	1.54	other	false	2025-02-08T08:02:15.368025	true
Qwen/Qwen2-72B		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	🟢 : pretrained	0	72.7	other	false	2025-01-21T07:41:00.318347	true
Qwen/Qwen2-7B-Instruct		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	7.62	other	false	2025-02-08T07:02:14.997658	true
Qwen/Qwen2.5-0.5B-Instruct		main	bfloat16	Original	FINISHED	2025-01-17T14:43:05Z	💬 : chat models (RLHF, DPO, IFT, ...)	12	0.494	apache-2.0	false	2025-02-08T06:21:47.357291	true
Qwen/Qwen2.5-0.5B		060db6499f32faf8b98477b0a26969ef7d8b9987	bfloat16	Original	FINISHED	2025-01-17T14:43:05Z	🟢 : pretrained	61	0.494	apache-2.0	false	2025-02-08T06:19:48.502455	true
Qwen/Qwen2.5-1.5B-Instruct		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	💬 : chat models (RLHF, DPO, IFT, ...)	16	1.544	apache-2.0	false	2025-02-08T06:42:14.926675	true
Qwen/Qwen2.5-1.5B		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	🟢 : pretrained	8	1.544	apache-2.0	false	2025-02-08T07:12:15.230903	true
Qwen/Qwen2.5-14B-Instruct		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	💬 : chat models (RLHF, DPO, IFT, ...)	13	14.77	apache-2.0	false	2025-02-08T11:18:22.068157	true
Qwen/Qwen2.5-14B		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	🟢 : pretrained	5	14.77	apache-2.0	false	2025-02-08T11:42:22.847597	false
Qwen/Qwen2.5-32B-Instruct		main	bfloat16	Original	FINISHED	2025-01-18T11:14:25Z	💬 : chat models (RLHF, DPO, IFT, ...)	7	32.764	apache-2.0	false	2025-02-08T09:02:20.345973	true
Qwen/Qwen2.5-32B		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	🟢 : pretrained	4	32.764	apache-2.0	false	2025-01-19T07:29:37.106248	true
Qwen/Qwen2.5-3B-Instruct		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	💬 : chat models (RLHF, DPO, IFT, ...)	18	3.086	other	false	2025-02-08T06:52:15.204714	true
Qwen/Qwen2.5-3B		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	🟢 : pretrained	5	3.086	other	false	2025-02-08T07:22:15.401441	true
Qwen/Qwen2.5-72B-Instruct		main	bfloat16	Original	FINISHED	2025-01-31T12:15:49.899735Z	💬 : chat models (RLHF, DPO, IFT, ...)	701	72.706	other	false	2025-02-08T19:51:01.141367	true
Qwen/Qwen2.5-72B		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	🟢 : pretrained	7	72.706	other	false	2025-01-20T07:14:54.034141	true
Qwen/Qwen2.5-7B-Instruct		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	💬 : chat models (RLHF, DPO, IFT, ...)	20	7.616	apache-2.0	false	2025-02-08T07:42:15.444769	true
Qwen/Qwen2.5-7B		main	bfloat16	Original	FINISHED	2025-01-17T21:58:33Z	🟢 : pretrained	6	7.616	apache-2.0	false	2025-02-08T07:32:15.370596	false
Qwen/Qwen3-1.7B-Base		main	bfloat16	Original	FINISHED	2025-06-23T18:18:07.180438Z	🟢 : pretrained	22	1.721	apache-2.0	false		false
Qwen/Qwen3-14B-Base		main	bfloat16	Original	FINISHED	2025-05-13T11:13:51.239357Z	🟢 : pretrained	19	14.768	apache-2.0	false	2025-05-13T12:00:53.365441	false
Qwen/Qwen3-14B		main	bfloat16	Original	FINISHED	2025-05-07T08:16:25.954878Z	💬 : chat models (RLHF, DPO, IFT, ...)	128	14.768	apache-2.0	false	2025-05-12T06:26:32.471241	true
Qwen/Qwen3-32B		main	bfloat16	Original	FINISHED	2025-06-27T05:14:13.927828Z	💬 : chat models (RLHF, DPO, IFT, ...)	414	32.762	apache-2.0	false		true
Qwen/Qwen3-4B-Base		main	bfloat16	Original	FINISHED	2025-06-23T18:14:24.007911Z	🟢 : pretrained	34	4.022	apache-2.0	false		false
Qwen/Qwen3-4B-Instruct-2507		main	bfloat16	Original	FINISHED	2025-09-11T07:47:26.843886Z	💬 : chat models (RLHF, DPO, IFT, ...)	282	4.022	apache-2.0	false		true
Qwen/Qwen3-4B		main	bfloat16	Original	FINISHED	2025-05-07T08:10:38.102249Z	💬 : chat models (RLHF, DPO, IFT, ...)	242	8.191	apache-2.0	false	2025-05-12T05:56:29.939952	true
Qwen/Qwen3-8B-Base		main	bfloat16	Original	FINISHED	2025-05-13T11:09:53.939738Z	🟢 : pretrained	26	8.191	apache-2.0	false	2025-05-13T11:30:47.813568	false
Qwen/Qwen3-8B		main	bfloat16	Original	FINISHED	2025-05-07T08:14:38.102249Z	💬 : chat models (RLHF, DPO, IFT, ...)	242	8.191	apache-2.0	false	2025-05-12T05:56:29.939952	true
Sakalti/Saka-1.5B		main	float16	Original	FINISHED	2025-02-11T09:31:26.084185Z	🤝 : base merges and merges	1	1.777		false	2025-02-11T10:47:27.139547	false
Sakalti/Saka-14B		main	float16	Original	FINISHED	2025-02-11T09:31:54.948549Z	🤝 : base merges and merges	5	14.766		false	2025-02-12T05:21:41.344042	false
Sakalti/SakaMoe-3x14B-Instruct		main	float16	Original	FINISHED	2025-02-19T10:28:44.992462Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	45.343	apache-2.0	false	2025-02-19T11:00:37.975586	false
Sakalti/Ultiima-72B		main	float16	Original	FINISHED	2025-02-11T09:32:32.001771Z	🤝 : base merges and merges	1	72.706	other	false	2025-02-11T11:47:30.706241	false
Sakalti/oxyge1-33B		main	bfloat16	Original	FINISHED	2025-03-07T09:34:40.866991Z	🤝 : base merges and merges	2	32.764	apache-2.0	false	2025-03-07T10:53:12.632435	true
SeaLLMs/SeaLLM-7B-v2.5		main	float16	Original	FINISHED	2025-01-22T13:33:37Z	🔶 : fine-tuned on domain-specific datasets	44	8.538	other	false	2025-02-06T08:25:34.187570	true
SeaLLMs/SeaLLMs-v3-7B-Chat		main	bfloat16	Original	FINISHED	2025-03-05T06:06:48.377744Z	💬 : chat models (RLHF, DPO, IFT, ...)	52	7.616	other	false	2025-03-05T06:20:22.364493	true
Slim205/Barka-2b-it	google/gemma-2-2b-it	main	bfloat16	Adapter	FINISHED	2025-04-18T08:46:59.484113Z	🔶 : fine-tuned on domain-specific datasets	1	0	mit	false		false
Slim205/Barka-9b-it	google/gemma-2-9b-it	main	bfloat16	Adapter	FINISHED	2025-04-18T08:48:13.195949Z	🔶 : fine-tuned on domain-specific datasets	0	0	mit	false		false
Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	💬 : chat models (RLHF, DPO, IFT, ...)	1	3.821	apache-2.0	false	2025-02-08T11:58:21.193972	true
TIGER-Lab/Qwen2.5-32B-Instruct-CFT		main	bfloat16	Original	FINISHED	2025-03-07T06:32:38.465113Z	💬 : chat models (RLHF, DPO, IFT, ...)	5	32.764	apache-2.0	false	2025-03-10T02:15:30.945668	true
TIGER-Lab/Qwen2.5-Math-7B-CFT		main	bfloat16	Original	FINISHED	2025-03-07T06:34:10.846416Z	💬 : chat models (RLHF, DPO, IFT, ...)	7	7.616	apache-2.0	false	2025-03-07T07:22:56.566280	true
TarjamaN/Pronoia-14b-community		main	bfloat16	Original	FINISHED	2025-02-10T18:01:56.806782Z	🔶 : fine-tuned on domain-specific datasets	0	14	apache-2.0	false	2025-02-11T09:44:43.352229	false
VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	🔶 : fine-tuned on domain-specific datasets	13	12.248	apache-2.0	false	2025-02-08T11:50:21.673103	true
airev-ai/emirati-14b-v2		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	🔶 : fine-tuned on domain-specific datasets	0	14.77	apache-2.0	false	2025-02-09T07:32:28.240060	true
amine-khelif/BARS		main	bfloat16	Original	FINISHED	2025-09-25T13:17:19.734293Z	🔶 : fine-tuned on domain-specific datasets	0	4.022	mit	false		true
arcee-ai/Arcee-Spark		main	bfloat16	Original	FINISHED	2025-01-19T07:28:17Z	💬 : chat models (RLHF, DPO, IFT, ...)	78	7.616	apache-2.0	false	2025-02-08T12:22:22.278691	true
cognitivecomputations/Dolphin3.0-R1-Mistral-24B		main	bfloat16	Original	FINISHED	2025-02-11T22:06:35.591253Z	💬 : chat models (RLHF, DPO, IFT, ...)	91	23.572		false	2025-02-11T22:18:43.247824	true
deep-analysis-research/D2IL-Arabic-Qwen2.5-72B-Instruct-v0.1		main	bfloat16	Original	FINISHED	2025-06-29T03:28:19.561472Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	72.706	other	false		false
deep-analysis-research/D2IL-Arabic-Qwen2.5-72B-Instruct-v0.2	Qwen/Qwen2.5-72B	main	bfloat16	Original	FINISHED	2025-09-12T07:49:09.029709Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	72.706	other	false		false
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B		main	bfloat16	Original	FINISHED	2025-03-07T08:12:40.777125Z	🔶 : fine-tuned on domain-specific datasets	454	14.77	mit	false	2025-03-07T09:23:05.908844	true
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B		main	bfloat16	Original	FINISHED	2025-03-07T08:11:14.459277Z	🔶 : fine-tuned on domain-specific datasets	1231	32.764	mit	false	2025-03-07T08:53:03.148264	true
freewheelin/free-evo-qwen72b-v0.8-re		main	float16	Original	FINISHED	2025-01-19T07:28:17Z	🔶 : fine-tuned on domain-specific datasets	3	72.288	mit	false	2025-02-06T06:20:41.352289	false
google/gemma-2-27b-it		main	bfloat16	Original	FINISHED	2025-02-01T05:04:02.122717Z	💬 : chat models (RLHF, DPO, IFT, ...)	509	27.227	gemma	false	2025-02-01T05:04:43.884909	true
google/gemma-2-27b		main	float16	Original	FINISHED	2025-01-30T09:20:03.598382Z	🟢 : pretrained	193	27.227	gemma	false	2025-01-30T09:23:21.646352	false
google/gemma-2-2b-it		main	bfloat16	Original	FINISHED	2025-02-19T14:03:23.500812Z	💬 : chat models (RLHF, DPO, IFT, ...)	961	2.614	gemma	false	2025-02-19T14:30:53.668405	true
google/gemma-2-9b-it		main	bfloat16	Original	FINISHED	2025-01-29T15:55:59.067426Z	💬 : chat models (RLHF, DPO, IFT, ...)	639	9.242	gemma	false	2025-02-09T07:33:17.513544	true

model	base_model	revision	precision	weight_type	status	submitted_time	model_type	likes	params	license	private	job_id	job_start_time	chat_template
meta-llama/Llama-4-Maverick-17B-128E-Instruct	unsloth/qwen3-14b-unsloth-bnb-4bit	main	bfloat16	Original	FAILED	2025-04-30T08:00:57.599346Z	🔶 : fine-tuned on domain-specific datasets	493	235.094	apache-2.0	false		2025-04-30T08:37:23.382560	false

model	base_model	revision	precision	weight_type	status	submitted_time	model_type	likes	params	license	private	job_start_time	chat_template
Qwen/Qwen3-235B-A22B		main	bfloat16	Original	FAILED	2025-04-30T08:00:57.599346Z	💬 : chat models (RLHF, DPO, IFT, ...)	493	235.094	apache-2.0	false	2025-04-30T08:37:23.382560	true
Qwen/Qwen3-30B-A3B-Instruct-2507		main	bfloat16	Original	FAILED	2025-08-04T02:20:03.742667Z	💬 : chat models (RLHF, DPO, IFT, ...)	374	30.532	apache-2.0	false		false
Sakalti/ultiima-108B		main	float16	Original	FAILED	2025-02-19T10:29:30.198367Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	107.814	other	false	2025-02-19T11:30:40.535277	false
Sakalti/ultiima-125B		main	float16	Original	FAILED	2025-02-19T10:28:09.333042Z	💬 : chat models (RLHF, DPO, IFT, ...)	0	125.367	other	false	2025-02-19T10:30:36.208911	false
beetlware/Bee1reason-arabic-Qwen-14B	unsloth/qwen3-14b-unsloth-bnb-4bit	main	float16	Adapter	FAILED	2025-05-21T14:45:27.899945Z	🔶 : fine-tuned on domain-specific datasets	0	0	apache-2.0	false		true
meta-llama/Llama-4-Maverick-17B-128E-Instruct		main	bfloat16	Original	FAILED	2025-09-22T11:26:49.914737Z	💬 : chat models (RLHF, DPO, IFT, ...)	404	401.584	other	false		true
microsoft/Phi-4-multimodal-instruct		main	bfloat16	Original	FAILED	2025-02-27T11:20:15.578684Z	💬 : chat models (RLHF, DPO, IFT, ...)	242	5.574	mit	false	2025-02-27T11:40:16.785764	true

About

While outstanding LLM models are being released competitively, most of them are centered on English and are familiar with the English cultural sphere. We operate the Open Arabic LLM Leaderboard (OALL), to evaluate models that reflect the characteristics of the Arabic language, culture and heritage. Through this, we hope that users can conveniently use the leaderboard, participate, and contribute to the advancement of research in the Arab region 🔥.

Icons & Model types

🟢 : pretrained

🟩 : continuously pretrained

💬 : chat models (RLHF, DPO, IFT, ...)

🔶 : fine-tuned on domain-specific datasets

🤝 : base merges and moerges

Notes:

We reserve the right to correct any incorrect tags or icons after manual verification to ensure the accuracy and reliability of the leaderboard. This helps maintain the integrity and trustworthiness of the platform.
Some models may be flagged as “Subjects of Caution” by the community. These models might have used the evaluation set for training, attempted to manipulate rankings, or raised ethical concerns. Models deemed as such may face restricted visibility or removal from the leaderboard. Users are advised to exercise discretion when interpreting rankings.
The leaderboard automatically hides models that were submitted, evaluated, and subsequently made private or gated post-evaluation. This platform is designed for “open” models that benefit the wider community. If you intend to restrict your model’s accessibility after using the leaderboard’s resources or exploit the platform solely for personal gains, please refrain from submitting. Violators may face bans on their usernames and/or organization IDs from future submissions.
The leaderboard no longer accepts models in float32 precision except under special circumstances. If you are the developer of a float32 model and believe it deserves inclusion, please reach out to us.
To ensure fair and equitable access to leaderboard resources, all usernames and organization IDs are limited to 5 submissions per week. This policy minimizes spamming, encourages thoughtful participation, and allows everyone in the community to benefit from the platform.

By adhering to these guidelines, we aim to foster a fair, collaborative, and transparent environment for evaluating and advancing open models for the arabic/arabic-interested communities.

How it works

📈 We evaluate models using LightEval, a unified and straightforward framework from the HuggingFace Eval Team to test and assess causal language models on a large number of different evaluation tasks.

To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings 0-shots. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.

Also, given the nature of the tasks, which include multiple-choice questions, the leaderboard primarily uses normalized log likelihood accuracy loglikelihood_acc_norm for all tasks.

Please, consider reaching out to us through the discussions tab if you are working on benchmarks for Arabic LLMs and willing to see them on this leaderboard as well. Your benchmark might change the whole game for Arabic models !

Details and Logs

Detailed numerical results in the results OALL dataset: https://huggingface.co/datasets/OALL/v2_results
Community queries and running status in the requests OALL dataset: https://huggingface.co/datasets/OALL/requests_v2

More resources

For evaluations of chat models using 3C3H on generative tasks benchmarks, please refer to the AraGen-Leaderboard.

If you still have questions, you can check our FAQ here!