Hypothetical Training for Robust Machine Reading Comprehension of Tabular Context

Moxin Li, Wenjie Wang, Fuli Feng, Hanwang Zhang, Qifan Wang, Tat-Seng Chua

Findings: Question Answering Findings Paper

Session 7: Question Answering (Virtual Poster)
Conference Room: Pier 7&8
Conference Time: July 12, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 12, Session 7 (15:00-16:30 UTC)
Keywords: generalization
TLDR: Machine Reading Comprehension (MRC) models easily learn spurious correlations from complex contexts such as tabular data. Counterfactual training---using the factual and counterfactual data by augmentation---has become a promising solution. However, it is costly to construct faithful counterfactual ...
You can open the #paper-P3072 channel in a separate window.
Abstract: Machine Reading Comprehension (MRC) models easily learn spurious correlations from complex contexts such as tabular data. Counterfactual training---using the factual and counterfactual data by augmentation---has become a promising solution. However, it is costly to construct faithful counterfactual examples because it is tricky to maintain the consistency and dependency of the tabular data. In this paper, we take a more efficient fashion to ask \textbf{hypothetical questions} like \textit{``in which year would the net profit be larger if the revenue in 2019 were \$38,298?''}, whose effects on the answers are equivalent to those expensive counterfactual tables. We propose a hypothetical training framework that uses paired examples with different hypothetical questions to supervise the direction of model gradient towards the counterfactual answer change. The superior generalization results on tabular MRC datasets, including a newly constructed stress test and MultiHiertt, validate our effectiveness.