SIGHAN Datasets

doi:doi:10.57702/yoqrix0i

SIGHAN Datasets

The SIGHAN datasets are used for Chinese Spelling Check (CSC) task, with a limited number of Chinese characters and their corresponding errors.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Wangxuan Institute of Computer Technology, Peking University, Center for Data Science, Peking University, The MOE Key Laboratory of Computational Linguistics, Peking University (2024). Dataset: SIGHAN Datasets. https://doi.org/10.57702/yoqrix0i

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2307.13655
Author	Wangxuan Institute of Computer Technology, Peking University
More Authors	Center for Data Science, Peking University The MOE Key Laboratory of Computational Linguistics, Peking University