-
Hindi-English Code-Switched Sentences
The dataset used in the paper is a collection of Hindi-English code-switched sentences. -
CoNLL-2009
The CoNLL-2009 dataset is used for semantic role labeling (SRL) task. It contains 10,177 sentences in English and 10,177 sentences in Chinese. -
ArzEnSEG corpus
The ArzEnSEG corpus is a morphologically annotated dataset for code-switched Egyptian Arabic-English. -
ArzEn parallel corpus
The ArzEn parallel corpus consists of speech transcriptions gathered through informal interviews with bilingual Egyptian Arabic-English speakers, as well as their English... -
English-to-Chinese Controlled Machine Translation
The dataset for English-to-Chinese controlled machine translation. -
English Controlled Machine Translation
The dataset for English controlled machine translation. -
English Controlled Paraphrase Generation
The dataset for English controlled paraphrase generation. -
LDC2015E86
LDC2015E86 is a dataset of abstract meaning representation (AMR) annotations for English. -
SemEval07 corpus
The SemEval07 corpus is a dataset for semantic frame parsing in English. -
GigaSpeech
GigaSpeech: An evolving, multi-domain ASR corpus with 10,000 hours of transcribed audio. -
PADT, EWT, GSD, HDT, and SynTagRus
PADT, EWT, GSD, HDT, and SynTagRus are UD treebanks. -
ASRU 2019 Mandarin-English code-switching speech recognition challenge
The ASRU 2019 Mandarin-English code-switching speech recognition challenge dataset. -
WMT 2014 English-German task
The dataset used for the Second Workshop on Neural Machine Translation and Generation