Multilingual Conceptual Coverage in Text-to-Image Models

Michael S Saxon; William Yang Wang

Multilingual Conceptual Coverage in Text-to-Image Models

Michael S Saxon, William Yang Wang

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Main: Language Grounding to Vision, Robotics, and Beyond Main-poster Paper

Poster Session 6: Language Grounding to Vision, Robotics, and Beyond (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 12, 09:00-10:30 (EDT) (America/Toronto)

Global Time: July 12, Poster Session 6 (13:00-14:30 UTC)

Keywords: cross-modal content generation

Languages: spanish, german, chinese, japanese, hebrew, indonesian

TLDR: We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we can assess "conceptual coverage" of a given target ...

You can open the #paper-P4549 channel in a separate window.

Abstract: We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language. This technique allows us to estimate how well-suited a model is to a target language as well as identify model-specific weaknesses, spurious correlations, and biases without a-priori assumptions. We demonstrate how it can be used to benchmark T2I models in terms of multilinguality, and how despite its simplicity it is a good proxy for impressive generalization.