-
arXMLiv 2018
The arXMLiv 2018 dataset is an HTML collection of the arXiv.org preprint archive, used as a training corpus for word embedding techniques. -
COVID-19 Vaccination Search Insights
COVID-19 Vaccination Search Insights dataset is a collection of anonymized search queries and their corresponding labels, which indicate whether the query is related to COVID-19... -
TREC Deep Learning 2021 Collection
The TREC Deep Learning 2021 collection is a test collection for information retrieval evaluation, adopting a shallow pooling approach. -
TREC-8 Ad Hoc Collection
The TREC-8 ad hoc collection is a test collection for information retrieval evaluation, known for its high-quality pool. -
Concept Embedding for Information Retrieval
Conceptual indexing includes the process of annotating raw text by concepts of a particular knowledge source. It is used to represent the content of documents and queries by... -
COVID-19 Information Retrieval and Extraction
The dataset used for COVID-19 information retrieval and extraction -
TREC 2019 and TREC 2020 Deep Learning Track datasets
TREC 2019 and TREC 2020 Deep Learning Track datasets -
Wikipedia dataset
The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries... -
Baidu Search Dataset
The Baidu search dataset is a large-scale search dataset for unbiased learning to rank. -
ULTRE-2 Task
The ULTRE-2 task encourages participants to explore ULTR approaches to alleviate various types of biases in real user clicks during training, and achieve better ranking... -
Reuters21578
The problem of similarity search is to find the most similar items in a large collection to a query item of interest. Fast similarity search is at the core of many information... -
Tetun Test Collection
The Tetun test collection is a document-level audited dataset for relevance judgments. -
Labadain-30k+
The Labadain-30k+ dataset is a monolingual Tetun document-level audited dataset. -
Reuters-21578
Text classification problem has long been an interesting research field, the aim of text classification is to develop algorithm to find the categories of given documents.