Neural Machine Translation for Mathematical Formulae

Felix Petersen, Moritz Schubotz, Andre Greiner-Petter, Bela Gipp

Main: NLP Applications Main-poster Paper

Poster Session 5: NLP Applications (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 11, 16:15-17:45 (EDT) (America/Toronto)
Global Time: July 11, Poster Session 5 (20:15-21:45 UTC)
Keywords: mathematical nlp
Languages: mathematica, latex, semantic latex
TLDR: We tackle the problem of neural machine translation of mathematical formulae between ambiguous presentation languages and unambiguous content languages. Compared to neural machine translation on natural language, mathematical formulae have a much smaller vocabulary and much longer sequences of symbo...
You can open the #paper-P5855 channel in a separate window.
Abstract: We tackle the problem of neural machine translation of mathematical formulae between ambiguous presentation languages and unambiguous content languages. Compared to neural machine translation on natural language, mathematical formulae have a much smaller vocabulary and much longer sequences of symbols, while their translation requires extreme precision to satisfy mathematical information needs. In this work, we perform the tasks of translating from LaTeX to Mathematica as well as from LaTeX to semantic LaTeX. While recurrent, recursive, and transformer networks struggle with preserving all contained information, we find that convolutional sequence-to-sequence networks achieve 95.1\% and 90.7\% exact matches, respectively.