Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models
Hitomi Yanaka, Yuta Nakamura, Yuki Chida, Tomoya Kurosawa
The 5th Workshop on Clinical Natural Language Processing (ClinicalNLP) N/a Paper
TLDR:
Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis.
We provide a visual reasoning dataset focusing on numerical understanding in the medical d
You can open the
#paper-ClinicalNLP_3
channel in a separate window.
Abstract:
Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis.
We provide a visual reasoning dataset focusing on numerical understanding in the medical domain.
The experiments using our dataset show that current vision-and-language models fail to perform numerical inference in the medical domain.
However, the data augmentation with only a small amount of our dataset improves the model performance, while maintaining the performance in the general domain.