-
PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification
PAWS-X: A cross-lingual adversarial dataset for paraphrase identification. -
Factorising Meaning and Form for Intent-Preserving Paraphrasing
Propose a method for generating paraphrases of English questions that retain the original intent but use a different surface form. -
Penn Treebank and Wikipedia-90M
The Penn Treebank dataset is used for sentence-level language modeling, and the 90 million word subset of Wikipedia is used for paraphrasing.