4 datasets found

Tags: natural language processing

Filter Results
  • Russian Noun Dataset

    The dataset used for clustering contains the 2000 most frequent nouns in the Russian Web corpus.
  • Spanish Noun Dataset

    The dataset used for clustering contains the 2000 most frequent nouns in the Spanish Gigaword corpus.
  • English Noun Dataset

    The dataset used for clustering contains the 2000 most frequent nouns in the British National Corpus (BNC) and the English Gigaword corpus.
  • CSL

    The CSL dataset is a large-scale Chinese scientific literature dataset obtained from the "Qianyan" open-source NLP platform. It consists of 396,209 Chinese core journal papers'...