We introduce WinoGrande, a new collection of Winograd Schema Challenge (WSC) problems that are adversarially constructed to be robust against spurious statistical biases. While the original WSC dataset provided only 273 instances, WinoGrande includes 43,985 instances, half of which are determined as adversarial. Key to our approach is a new adversarial filtering algorithm AfLite for systematic bias reduction, combined with a careful crowdsourcing design. Despite the significant increase in training data, the performance of existing state-of-the-art methods remains modest (61.6%) and contrasts with high human performance (90.8%) for the binary questions. In addition, WinoGrande allows us to use transfer learning for achieving new state-of-the-art results on the original WSC and related datasets. Finally, we discuss how biases lead to overestimating the true capabilities of machine commonsense.


WinoGrande: An Adversarial Winograd Schema Challenge at Scale

Keisuke Sakaguchi, Ronan LeBras, Chandra Bhagavatula, and Yejin Choi preprint  2019

Download Datasets (Under Construction)