Vision-Language Tasks - Groups

GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.
- Dataset
- JSON
InternLM2

InternLM2 is a vision-language large model that supports images with any aspect ratio from 336 pixels up to 4K HD, facilitating its deployment in real-world contexts.
- Dataset
- JSON

2 datasets found