Progress on true open-domain social dialogue agents has been hindered by the lack of diversity, scale, and quality of training corpora. SODA is the first million-scale high-quality dialogue dataset covering a wide range of social interactions.