Information Retrieval - Groups

CORD-19

The CORD-19 dataset contains academic journal articles relating to a variety of coronaviruses and related viral infections, not only COVID-19, sourced from PubMed Central (PMC),...
- Dataset
- JSON
COVID-19 Information Retrieval and Extraction

The dataset used for COVID-19 information retrieval and extraction
- Dataset
- JSON
BEIR

The BEIR dataset is a large-scale zero-shot evaluation dataset for information retrieval models, consisting of 13,000 documents and 1,000 questions.
- Dataset
- JSON
TREC 2019 and TREC 2020 Deep Learning Track datasets

TREC 2019 and TREC 2020 Deep Learning Track datasets
- Dataset
- JSON
MS MARCO and DL-Typo

Two datasets used in the paper: MS MARCO and DL-Typo.
- Dataset
- JSON
SERP dataset

The dataset used in the paper is a collection of search engine result pages (SERPs) with their corresponding relevance scores.
- Dataset
- JSON
Wikipedia Corpus

The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...
- Dataset
- JSON
Wikipedia dataset

The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...
- Dataset
- JSON
Baidu Search Dataset

The Baidu search dataset is a large-scale search dataset for unbiased learning to rank.
- Dataset
- JSON
ULTRE-2 Task

The ULTRE-2 task encourages participants to explore ULTR approaches to alleviate various types of biases in real user clicks during training, and achieve better ranking...
- Dataset
- JSON
TMC

The TMC dataset is a collection of air traffic reports.
- Dataset
- JSON
Reuters21578

The problem of similarity search is to find the most similar items in a large collection to a query item of interest. Fast similarity search is at the core of many information...
- Dataset
- JSON
MSMARCO

The dataset used for training and evaluating IR systems, containing a large collection of documents and queries.
- Dataset
- JSON
TripClick

The TripClick dataset is a large-scale benchmark for information retrieval.
- Dataset
- JSON
WordNet-Based Information Retrieval Using Common Hypernyms and Combined Features

Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings improves search...
- Dataset
- JSON
CLEF 2003

The dataset used for the experiments in the paper.
- Dataset
- JSON
Tetun Test Collection

The Tetun test collection is a document-level audited dataset for relevance judgments.
- Dataset
- JSON
Labadain-30k+

The Labadain-30k+ dataset is a monolingual Tetun document-level audited dataset.
- Dataset
- JSON
Reuters-21578

Text classiﬁcation problem has long been an interesting research ﬁeld, the aim of text classiﬁcation is to develop algorithm to ﬁnd the categories of given documents.
- Dataset
- JSON
TREC-COVID

The TREC-COVID dataset is a collection of journal articles related to COVID-19 and other coronaviruses, with human annotators providing relevancy judgments at the end of each...
- Dataset
- JSON

77 datasets found