A human-centric evaluation framework for Large Multimodal Models (LMMs) across 7 tasks, 7 HC principles, 5 social attributes, and 11 languages — built on 32,000+ expert-verified real-world image–question pairs.
32K+
Image–Question Pairs
~1,500
Unique Images
7
Evaluation Tasks
15
LMMs Evaluated
11
Languages
HC Principle Scores
Aggregate accuracy (%) per Human-Centric principle across all relevant tasks. Higher is better. Click model names to visit their official pages. Microsoft | 5.6B | Closed | 61.1% | 99.0% | 74.8% | 79.2% | 62.5% | 90.5% | 50.9% | 74.0% |
Overall = mean of all 7 principle scores. -- indicates data not yet available.
Built with ❤️ by the Vector Institute