HumaniBench Leaderboard

A human-centric evaluation framework for Large Multimodal Models (LMMs) across 7 tasks, 7 HC principles, 5 social attributes, and 11 languages — built on 32,000+ expert-verified real-world image–question pairs.

32K+

Image–Question Pairs

~1,500

Unique Images

Evaluation Tasks

LMMs Evaluated

Languages

HumaniBench evaluates 15 LMMs across 7 human-centric tasks using 32K+ expert-verified real-world image–question pairs spanning 5 social attributes and 11 languages.

HC Principle Scores

Aggregate accuracy (%) per Human-Centric principle across all relevant tasks. Higher is better. Click model names to visit their official pages.


🥇	Llama-3.2-11B	Microsoft	5.6B	Closed	61.0%	98.9%	73.5%	78.8%	62.2%	89.5%	57.2%	74.4%


🥇	Gemini-2.0-Flash	Google	-	Closed	61.0%	98.9%	73.5%	78.8%	62.2%	89.5%	57.2%	74.4%
🥈	GPT-4o	OpenAI	-	Closed	61.1%	99.0%	74.8%	79.2%	62.5%	90.5%	50.9%	74.0%
🥉	Phi-4	Microsoft	5.6B	Open	59.2%	98.2%	78.6%	77.4%	61.3%	79.0%	45.7%	71.3%
4	Qwen-2.5-7B	Alibaba	7B	Open	63.1%	96.5%	84.9%	67.1%	57.4%	73.8%	53.6%	70.9%
5	Gemma-3	Google	4B	Open	57.5%	94.6%	73.2%	67.8%	57.7%	79.8%	58.3%	69.8%
6	LLaVA-v1.6	LLaVA	7B	Open	59.7%	94.4%	80.3%	68.1%	55.4%	66.3%	60.6%	69.3%
7	Phi-3.5	Microsoft	4B	Open	56.0%	96.1%	72.3%	69.7%	57.3%	70.8%	50.5%	67.5%
8	Janus-Pro-7B	DeepSeek	7B	Open	50.2%	96.9%	63.3%	65.2%	57.6%	69.5%	52.8%	65.1%
9	InternVL2.5	OpenGVLab	8B	Open	50.9%	93.8%	63.8%	64.4%	51.1%	74.5%	56.4%	65.0%
10	CogVLM2-19B	THUDM	19B	Open	53.1%	96.3%	67.5%	74.4%	60.4%	68.0%	35.1%	65.0%
11	Aya-Vision-8B	Cohere	8B	Open	51.7%	94.9%	64.4%	68.1%	50.8%	77.8%	45.9%	64.8%
12	Llama-3.2-11B	Meta	11B	Open	50.2%	94.9%	58.9%	63.0%	50.7%	71.3%	56.7%	63.7%
13	Molmo-7V	Allen AI	7B	Open	52.4%	94.8%	66.2%	65.8%	55.0%	58.8%	49.7%	63.2%
14	GLM-4V-9B	THUDM	9B	Open	50.2%	94.4%	63.9%	63.0%	50.0%	67.8%	50.5%	62.8%
15	DeepSeek-VL2-Small	DeepSeek	3B	Open	48.8%	90.6%	54.8%	61.6%	49.1%	59.3%	55.7%	60.0%

↕ Click any column header to sort · ■ ≥75% ■ 60–74% ■ <60%

Overall = mean of all 7 principle scores. -- indicates data not yet available.