Research Vision

Conversation often relies on non-verbal cues: visual information like physical expressions, body gesture, or the surrounding environment are used by interlocutors to shape and understand meaning. CHAMPAGNE is a generative model of conversations trained on large-scale web videos that can account for visual contexts.

Paper

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

Seungju Han, Jack Hessel, Nouha Dziri, Yejin Choi, and Youngjae Yu ArXiv  2023