-
WikipassageQA, InsuranceQA v2, and MS-MARCO
The dataset contains three passage-ranking datasets: WikipassageQA, InsuranceQA v2, and MS-MARCO. -
Deeper text understanding for IR with contextual neural language modeling
This paper proposes a method for learning-to-rank with contextual neural language modeling. -
Learning to rank: from pairwise approach to listwise approach
This paper proposes a method for learning to rank, which is a key task in information retrieval. -
TREC Dynamic Domain 2015 ad-hoc retrieval task
The dataset used in the paper is the TREC Dynamic Domain 2015 ad-hoc retrieval task, which includes search result diversification. The dataset consists of 23 official runs and... -
TREC Web Track 2014 ad-hoc retrieval task
The dataset used in the paper is the TREC Web Track 2014 ad-hoc retrieval task, which includes search result diversification. The dataset consists of 50 test topics and 10,000... -
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is... -
SERP dataset
The dataset used in the paper is a collection of search engine result pages (SERPs) with their corresponding relevance scores. -
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,... -
TREC-COVID
The TREC-COVID dataset is a collection of journal articles related to COVID-19 and other coronaviruses, with human annotators providing relevancy judgments at the end of each... -
MS MARCO, NQ, TREC DL, TREC-COVID
Four datasets are used to evaluate the retrieval effectiveness of different dimension reduction models, including MS MARCO (Passage Ranking), NQ, TREC DL, and TREC-COVID. -
ClueWeb09 dataset
The ClueWeb09 dataset is a large-scale dataset for web search and information retrieval.