-
Russian Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the Russian Web corpus. -
Spanish Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the Spanish Gigaword corpus. -
English Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the British National Corpus (BNC) and the English Gigaword corpus.