Improving Radiology Summarization with Radiograph and Anatomy Prompts

jinpeng hu; Zhihong Chen; Yang Liu; Xiang Wan; Tsung-Hui Chang

Improving Radiology Summarization with Radiograph and Anatomy Prompts

jinpeng hu, Zhihong Chen, Yang Liu, Xiang Wan, Tsung-Hui Chang

📝 Paper

Anthology

Underline 🪧 Poster 📺 Watch Video on Underline Add to Favorites

Findings: Summarization Findings Paper

Session 4: Summarization (Virtual Poster)

Conference Room: Pier 7&8

Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 11, Session 4 (15:00-16:30 UTC)

Keywords: multimodal summarization

TLDR: The impression is crucial for the referring physicians to grasp key information since it is concluded from the findings and reasoning of radiologists. To alleviate the workload of radiologists and reduce repetitive human labor in impression writing, many researchers have focused on automatic impress...

You can open the #paper-P2987 channel in a separate window.

Abstract: The impression is crucial for the referring physicians to grasp key information since it is concluded from the findings and reasoning of radiologists. To alleviate the workload of radiologists and reduce repetitive human labor in impression writing, many researchers have focused on automatic impression generation. However, recent works on this task mainly summarize the corresponding findings and pay less attention to the radiology images. In clinical, radiographs can provide more detailed valuable observations to enhance radiologists' impression writing, especially for complicated cases. Besides, each sentence in findings usually focuses on single anatomy, such that they only need to be matched to corresponding anatomical regions instead of the whole image, which is beneficial for textual and visual features alignment. Therefore, we propose a novel anatomy-enhanced multimodal model to promote impression generation. In detail, we first construct a set of rules to extract anatomies and put these prompts into each sentence to highlight anatomy characteristics. Then, two separate encoders are applied to extract features from the radiograph and findings. Afterward, we utilize a contrastive learning module to align these two representations at the overall level and use a co-attention to fuse them at the sentence level with the help of anatomy-enhanced sentence representation. The experimental results on two benchmark datasets confirm the effectiveness of the proposed method, which achieves state-of-the-art results.