Few-shot Joint Multimodal Aspect-Sentiment Analysis Based on Generative Multimodal Prompt

Xiaocui Yang; Shi Feng; Daling Wang; Qi Sun; Wenfang Wu; Yifei Zhang; Pengfei Hong; Soujanya Poria

Few-shot Joint Multimodal Aspect-Sentiment Analysis Based on Generative Multimodal Prompt

Xiaocui Yang, Shi Feng, Daling Wang, Qi Sun, Wenfang Wu, Yifei Zhang, Pengfei Hong, Soujanya Poria

📝 Paper

Anthology

Underline 🪧 Poster 📺 Watch Video on Underline Add to Favorites

Findings: Sentiment Analysis, Stylistic Analysis, and Argument Mining Findings Paper

Session 4: Sentiment Analysis, Stylistic Analysis, and Argument Mining (Virtual Poster)

Conference Room: Pier 7&8

Conference Time: July 11, 11:00-12:30 (EDT) (America/Toronto)

Global Time: July 11, Session 4 (15:00-16:30 UTC)

Spotlight Session: Spotlight - Metropolitan East (Spotlight)

Conference Room: Metropolitan East

Conference Time: July 10, 19:00-21:00 (EDT) (America/Toronto)

Global Time: July 10, Spotlight Session (23:00-01:00 UTC)

Keywords: applications

TLDR: We have witnessed the rapid proliferation of multimodal data on numerous social media platforms. Conventional studies typically require massive labeled data to train models for Multimodal Aspect-Based Sentiment Analysis (MABSA). However, collecting and annotating fine-grained multimodal data for MAB...

You can open the #paper-P1694 channel in a separate window.

Abstract: We have witnessed the rapid proliferation of multimodal data on numerous social media platforms. Conventional studies typically require massive labeled data to train models for Multimodal Aspect-Based Sentiment Analysis (MABSA). However, collecting and annotating fine-grained multimodal data for MABSA is tough. To alleviate the above issue, we perform three MABSA-related tasks with quite a small number of labeled multimodal samples. We first build diverse and comprehensive multimodal few-shot datasets according to the data distribution. To capture the specific prompt for each aspect term in a few-shot scenario, we propose a novel Generative Multimodal Prompt (GMP) model for MABSA, which includes the Multimodal Encoder module and the N-Stream Decoders module. We further introduce a subtask to predict the number of aspect terms in each instance to construct the multimodal prompt. Extensive experiments on two datasets demonstrate that our approach outperforms strong baselines on two MABSA-related tasks in the few-shot setting.