-
SexHateLex
The SexHateLex lexicon is a large collection of sexist and abusive terms in Chinese. -
SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection
The SWSR dataset consists of two files: SexWeibo.csv and SexComment.csv, containing weibos (posts) and comments (replies) respectively. -
Chinese–Japanese Unsupervised Neural Machine Translation Using Sub-character ...
Chinese–Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information -
Chinese Corpus
The dataset is used to analyze corpora in a completely language independent and unsupervised way without any prior linguistic knowledge. -
CoNLL-2009
The CoNLL-2009 dataset is used for semantic role labeling (SRL) task. It contains 10,177 sentences in English and 10,177 sentences in Chinese. -
Chinese Prosody Prediction Dataset
The dataset used in the paper for automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. -
Chinese-to-English Controlled Machine Translation
The dataset for Chinese-to-English controlled machine translation. -
Chinese Controlled Paraphrase Generation
The dataset for Chinese controlled paraphrase generation. -
Chinese Medical Short Sentence (CMSS) corpus
The Chinese Medical Short Sentence (CMSS) corpus contains 17,787 sentences that classified in three symptom severity rating: slightly, moderately and heavily. -
FewCLUE dataset
The FewCLUE dataset is a Chinese few-shot learning evaluation benchmark. -
Chinese Spell Check
The proposed approach achieves SOTA error correction results on two spell check datasets. -
NIST 2003 (MT03), NIST 2004 (MT04), NIST 2005 (MT05), NIST 2006 (MT06) datasets
Chinese↔English translation tasks, KFTT English↔Japanese translation datasets. -
China Workshop on Machine Translation in 2017
The dataset used in the paper is the news data from China Workshop on Machine Translation in 2017.