A human-centric evaluation framework for Large Multimodal Models (LMMs) across 7 tasks, 7 HC principles, 5 social attributes, and 11 languages — built on 32,000+ expert-verified real-world image–question pairs.
32K+
Image–Question Pairs
~1,500
Unique Images
7
Evaluation Tasks
15
LMMs Evaluated
11
Languages
HumaniBench evaluates 15 LMMs across 7 human-centric tasks using 32K+ expert-verified real-world image–question pairs spanning 5 social attributes and 11 languages.
HC Principle Scores
Aggregate accuracy (%) per Human-Centric principle across all relevant tasks. Higher is better. Click model names to visit their official pages. 🥇 | Microsoft | 5.6B | Closed | 61.0% | 98.9% | 73.5% | 78.8% | 62.2% | 89.5% | 57.2% | 74.4% |
↕ Click any column header to sort · ■ ≥75% ■ 60–74% ■ <60%
Overall = mean of all 7 principle scores. -- indicates data not yet available.
Built with ❤️ by the Vector Institute