Unlikelihood-training and back-training for robust natural language understanding

Siva Reddy / McGill

Talk: , -

Meet the Speaker in Gather.Town:
Chats I
, -
Abstract: Language models are known to be good at generalization and memorization. These abilities mean that a language model can be directly be used as a knowledge base, e.g., a language model could easily fill the blank in “The capital of Canada is BLANK” with Ottawa, even if the exact construction is never seen during training, a task that requires both generalization and memorization. But we also observe that complex phenomena such as negation are commonly ignored by language models, e.g., the model would still predict Ottawa as the answer to “The capital of Canada is not BLANK”. I will introduce a new training procedure and objective called “unlikelihood training with reference” in order to build language models that understand negation without explicitly training on factual knowledge. In the second part of the talk, I will show that pretrain and fine-tune paradigm breaks in the out-of-distribution setting. For example, question answering and generation models trained on Natural Questions do not generalize to other domains such as education or bio-medical. I will introduce a new technique called back-training that exploits unsupervised data in the target domains much more efficiently than self-training.

Bio: Siva Reddy is an Assistant Professor in the School of Computer Science and Linguistics at McGill University. He is a Facebook CIFAR AI Chair and a core faculty member of Mila Quebec AI Institute. Before McGill, he was a postdoctoral researcher at Stanford University. He received his PhD from the University of Edinburgh in 2017, where he was a Google PhD Fellow. His research focuses on representation learning for language that facilitates systematic generalization and conversational models. He received the 2020 VentureBeat AI Innovation Award in NLP.