[SRW] Aligning Code-Switching Metrics with Bilingual Behavior

Rebecca Pattichis, Sonya Trawick, Dora LaCasse, Rena Torres Cacoullos

Student Research Workshop Srw Paper

Session 7: Student Research Workshop (Poster)
Conference Room: Frontenac Ballroom and Queen's Quay
Conference Time: July 12, 11:00-12:30 (EDT) (America/Toronto)
Global Time: July 12, Session 7 (15:00-16:30 UTC)
TLDR: Models and metrics of linguistic code-switching (CS) have almost exclusively worked with word-level units. However, any two words are not equally likely CS points in bilingual speech. In addition, other-language single-word items and alternating-language multi-word items have distinct properties. Ad...
You can open the #paper-S25 channel in a separate window.
Abstract: Models and metrics of linguistic code-switching (CS) have almost exclusively worked with word-level units. However, any two words are not equally likely CS points in bilingual speech. In addition, other-language single-word items and alternating-language multi-word items have distinct properties. Adapting these familiar metrics to the Intonation Unit (IU), we capture a shared tendency for CS to occur across rather than within prosodic boundaries. This constraint is distorted when single- and multi-word other-language items are merged. Individual differences according to language distribution and CS rates are independent, visualized in the number and breadth of language bands in transcripts of bilingual speech. These results are important to consider in future development of code-switched datasets for NLP tasks, as the IU token and exclusion/inclusion of single-word items highly impacts the CS represented in the input text.