Intermediate Domain Finetuning for Weakly Supervised Domain-adaptive Clinical NER

Shilpa Suresh, Nazgol Tavabi, Shahriar Golchin, Leah Gilreath, Rafael Garcia-Andujar, Alexander Kim, Joseph Murray, Blake Bacevich, Ata Kiapour

BioNLP and BioNLP-ST 2023 Short paper Paper

TLDR: Accurate human-annotated data for real-worlduse cases can be scarce and expensive to obtain.In the clinical domain, obtaining such data is evenmore difficult due to privacy concerns which notonly restrict open access to quality data but also require that the annotation be done by domain experts.In t
You can open the #paper-BioNLP_44 channel in a separate window.
Abstract: Accurate human-annotated data for real-worlduse cases can be scarce and expensive to obtain.In the clinical domain, obtaining such data is evenmore difficult due to privacy concerns which notonly restrict open access to quality data but also require that the annotation be done by domain experts.In this paper, we propose a novel framework - InterDAPT - that leverages Intermediate Domain Finetuning to allow language models to adapt to narrow domains with small, noisy datasets. By making use of peripherally-related, unlabeled datasets,this framework circumvents domain-specific datascarcity issues. Our results show that this weaklysupervised framework provides performance improvements in downstream clinical named entityrecognition tasks.