Information Retrieval - Groups

Doc2Token

The dataset used in this paper for novel token prediction in e-commerce search.

Dataset
JSON

WikipassageQA, InsuranceQA v2, and MS-MARCO

The dataset contains three passage-ranking datasets: WikipassageQA, InsuranceQA v2, and MS-MARCO.

Dataset
JSON

Deeper text understanding for IR with contextual neural language modeling

This paper proposes a method for learning-to-rank with contextual neural language modeling.

Dataset
JSON

Learning to rank: from pairwise approach to listwise approach

This paper proposes a method for learning to rank, which is a key task in information retrieval.

Dataset
JSON

TREC Dynamic Domain 2015 ad-hoc retrieval task

The dataset used in the paper is the TREC Dynamic Domain 2015 ad-hoc retrieval task, which includes search result diversification. The dataset consists of 23 official runs and...

Dataset
JSON

TREC Web Track 2014 ad-hoc retrieval task

The dataset used in the paper is the TREC Web Track 2014 ad-hoc retrieval task, which includes search result diversification. The dataset consists of 50 test topics and 10,000...

Dataset
JSON

Web2Text: Deep Structured Boilerplate Removal

Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is...

Dataset
JSON

Robust04

The dataset used in the paper is the Robust04 dataset, a news corpus containing 0.5M documents and 249 queries.

Dataset
JSON

BEIR

The BEIR dataset is a large-scale zero-shot evaluation dataset for information retrieval models, consisting of 13,000 documents and 1,000 questions.

Dataset
JSON

SERP dataset

The dataset used in the paper is a collection of search engine result pages (SERPs) with their corresponding relevance scores.

Dataset
JSON

Wikipedia Corpus

The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...

Dataset
JSON

MSMARCO

The dataset used for training and evaluating IR systems, containing a large collection of documents and queries.

Dataset
JSON

TREC-COVID

The TREC-COVID dataset is a collection of journal articles related to COVID-19 and other coronaviruses, with human annotators providing relevancy judgments at the end of each...

Dataset
JSON

MS MARCO, NQ, TREC DL, TREC-COVID

Four datasets are used to evaluate the retrieval effectiveness of different dimension reduction models, including MS MARCO (Passage Ranking), NQ, TREC DL, and TREC-COVID.

Dataset
JSON

ClueWeb09 dataset

The ClueWeb09 dataset is a large-scale dataset for web search and information retrieval.

Dataset
JSON

Krapivin

The dataset used in the paper for keyphrase generation with correlation constraints.

Dataset
JSON

NUS

The dataset used in the paper for keyphrase generation with correlation constraints.

Dataset
JSON

Inspec

Keyphrase generation dataset for scientific articles

Dataset
JSON

KP20k

The dataset used in the paper for keyphrase generation with correlation constraints.

Dataset
JSON

19 datasets found