The dataset used in this paper is a collection of parallel sentence pairs from 96 different native languages, with at least 10,000 sentence pairs per language.
The Cambridge First Certificate in English (FCE) dataset is used as the source of ESL data. The corpus is a subset of the Cambridge Learner Corpus (CLC) and contains English...