[Industry] RadLing: Towards Efficient Radiology Report Understanding

Rikhiya Ghosh; Oladimeji Farri; Sanjeev Kumar Karn; Manuela Danu; Ramya Vunikili; Larisa Micu

[Industry] RadLing: Towards Efficient Radiology Report Understanding

Rikhiya Ghosh, Oladimeji Farri, Sanjeev Kumar Karn, Manuela Danu, Ramya Vunikili, Larisa Micu

📝 Paper

Anthology

Underline 📺 Watch Video on Underline Add to Favorites

Industry: Industry Industry Paper

Session 5: Industry (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 11, 16:15-17:45 (EDT) (America/Toronto)

Global Time: July 11, Session 5 (20:15-21:45 UTC)

TLDR: Most natural language tasks in the radiology domain use language models pre-trained on biomedical corpus. There are few pretrained language models trained specifically for radiology, and fewer still that have been trained in a low data setting and gone on to produce comparable results in fine-tuning...

You can open the #paper-I187 channel in a separate window.

Abstract: Most natural language tasks in the radiology domain use language models pre-trained on biomedical corpus. There are few pretrained language models trained specifically for radiology, and fewer still that have been trained in a low data setting and gone on to produce comparable results in fine-tuning tasks. We present RadLing, a continuously pretrained language model using ELECTRA-small architecture, trained using over 500K radiology reports that can compete with state-of-the-art results for fine tuning tasks in radiology domain. Our main contribution in this paper is knowledge-aware masking which is an taxonomic knowledge-assisted pre-training task that dynamically masks tokens to inject knowledge during pretraining. In addition, we also introduce an knowledge base-aided vocabulary extension to adapt the general tokenization vocabulary to radiology domain.