Information Retrieval - Groups

COVID-19 Information Retrieval and Extraction

The dataset used for COVID-19 information retrieval and extraction

Dataset
JSON

BEIR

The BEIR dataset is a large-scale zero-shot evaluation dataset for information retrieval models, consisting of 13,000 documents and 1,000 questions.

Dataset
JSON

TREC 2019 and TREC 2020 Deep Learning Track datasets

Dataset
JSON

MS MARCO and DL-Typo

Two datasets used in the paper: MS MARCO and DL-Typo.

Dataset
JSON

SERP dataset

The dataset used in the paper is a collection of search engine result pages (SERPs) with their corresponding relevance scores.

Dataset
JSON

Wikipedia Corpus

The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...

Dataset
JSON

Wikipedia dataset

The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...

Dataset
JSON

Baidu Search Dataset

The Baidu search dataset is a large-scale search dataset for unbiased learning to rank.

Dataset
JSON

ULTRE-2 Task

The ULTRE-2 task encourages participants to explore ULTR approaches to alleviate various types of biases in real user clicks during training, and achieve better ranking...

Dataset
JSON

TMC

The TMC dataset is a collection of air traffic reports.

Dataset
JSON

Reuters21578

The problem of similarity search is to find the most similar items in a large collection to a query item of interest. Fast similarity search is at the core of many information...

Dataset
JSON

MSMARCO

The dataset used for training and evaluating IR systems, containing a large collection of documents and queries.

Dataset
JSON

TripClick

The TripClick dataset is a large-scale benchmark for information retrieval.

Dataset
JSON

WordNet-Based Information Retrieval Using Common Hypernyms and Combined Features

Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings improves search...

Dataset
JSON

CLEF 2003

The dataset used for the experiments in the paper.

Dataset
JSON

Tetun Test Collection

The Tetun test collection is a document-level audited dataset for relevance judgments.

Dataset
JSON

Labadain-30k+

The Labadain-30k+ dataset is a monolingual Tetun document-level audited dataset.

Dataset
JSON

Reuters-21578

Text classiﬁcation problem has long been an interesting research ﬁeld, the aim of text classiﬁcation is to develop algorithm to ﬁnd the categories of given documents.

Dataset
JSON

TREC-COVID

The TREC-COVID dataset is a collection of journal articles related to COVID-19 and other coronaviruses, with human annotators providing relevancy judgments at the end of each...

Dataset
JSON

MS MARCO, NQ, TREC DL, TREC-COVID

Four datasets are used to evaluate the retrieval effectiveness of different dimension reduction models, including MS MARCO (Passage Ranking), NQ, TREC DL, and TREC-COVID.

Dataset
JSON

56 datasets found