-
Wikipedia dataset
The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries... -
Reuters21578
The problem of similarity search is to find the most similar items in a large collection to a query item of interest. Fast similarity search is at the core of many information... -
Reuters-21578
Text classification problem has long been an interesting research field, the aim of text classification is to develop algorithm to find the categories of given documents. -
20NewsGroups
The dataset used in this paper is a collection of documents from various domains, including news, articles, and emails.