MOSAIC

Research Vision

Conversation often relies on non-verbal cues: visual information like physical expressions, body gesture, or the surrounding environment are used by interlocutors to shape and understand meaning. CHAMPAGNE is a generative model of conversations trained on large-scale web videos that can account for visual contexts.

Paper

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

Seungju Han, Jack Hessel, Nouha Dziri, Yejin Choi, and Youngjae Yu • ArXiv • 2023

Resources

Code
Dataset and Model weights

CHAMPAGNE

Learning real-world conversation from web videos.

Research Vision

Paper

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

Seungju Han, Jack Hessel, Nouha Dziri, Yejin Choi, and Youngjae Yu • ArXiv • 2023

Resources