SIGHAN Datasets

The SIGHAN datasets are used for Chinese Spelling Check (CSC) task, with a limited number of Chinese characters and their corresponding errors.

Data and Resources

Cite this as

Wangxuan Institute of Computer Technology, Peking University, Center for Data Science, Peking University, The MOE Key Laboratory of Computational Linguistics, Peking University (2024). Dataset: SIGHAN Datasets. https://doi.org/10.57702/yoqrix0i

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2307.13655
Author Wangxuan Institute of Computer Technology, Peking University
More Authors
Center for Data Science, Peking University
The MOE Key Laboratory of Computational Linguistics, Peking University