Reuters RCV1-v2

The Reuters RCV1-v2 contains 804,414 newswire articles. There are 103 topics which form a tree hierarchy. Thus documents typically have multiple labels. The data was randomly split into 794,414 training and 10,000 test cases. The available data was already preprocessed by removing common stopwords and stemming. We use a vocabulary of the 10,000 most frequent words in the training dataset.

Data and Resources

Cite this as

Nitish Srivastava, Geoffrey Hinton, Ruslan Salakhutdinov (2024). Dataset: Reuters RCV1-v2. https://doi.org/10.57702/e2lqztoi

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Author Nitish Srivastava
More Authors
Geoffrey Hinton
Ruslan Salakhutdinov
Homepage https://www.cs.toronto.edu/~rsalakhu/RCV1-v2.html