A human-centric evaluation framework for Large Multimodal Models (LMMs) across 7 tasks, 7 HC principles, 5 social attributes, and 11 languages — built on 32,000+ expert-verified real-world image–question pairs.

32K+
Image–Question Pairs
~1,500
Unique Images
7
Evaluation Tasks
15
LMMs Evaluated
11
Languages
HumaniBench teaser figure

HC Principle Scores

Aggregate accuracy (%) per Human-Centric principle across all relevant tasks. Higher is better. Click model names to visit their official pages.

Overall = mean of all 7 principle scores. -- indicates data not yet available.