A human-centric evaluation framework for Large Multimodal Models (LMMs) across 7 tasks, 7 HC principles, 5 social attributes, and 11 languages — built on 32,000+ expert-verified real-world image–question pairs.

32K+
Image–Question Pairs
~1,500
Unique Images
7
Evaluation Tasks
15
LMMs Evaluated
11
Languages
HumaniBench teaser figure

HumaniBench evaluates 15 LMMs across 7 human-centric tasks using 32K+ expert-verified real-world image–question pairs spanning 5 social attributes and 11 languages.

HC Principle Scores

Aggregate accuracy (%) per Human-Centric principle across all relevant tasks. Higher is better. Click model names to visit their official pages.

↕ Click any column header to sort  ·  ≥75%   60–74%   <60%

Overall = mean of all 7 principle scores. -- indicates data not yet available.