-
TREC-CAR Benchmark Y1
The dataset used for the Retrieve-Cluster-Summarize system, consisting of 117 article-level queries and 126 test queries. -
WINGNUS: Keyphrase extraction utilizing document logical structure
The dataset used in this paper for keyphrase extraction utilizing document logical structure. -
SemEval-2010 Task 5 dataset
The dataset used in this paper for keyphrase extraction from academic articles. -
BBC News dataset
The BBC News dataset was used for sentiment analysis of news articles. -
NIPS full paper dataset
The NIPS full paper dataset is a collection of text documents. -
ClueWeb09B
The ClueWeb09B collection is a large-scale web search dataset, containing 31 million web pages, 31 million queries, and 1.5 billion documents. -
WikipassageQA, InsuranceQA v2, and MS-MARCO
The dataset contains three passage-ranking datasets: WikipassageQA, InsuranceQA v2, and MS-MARCO. -
Deeper text understanding for IR with contextual neural language modeling
This paper proposes a method for learning-to-rank with contextual neural language modeling. -
Learning to rank: from pairwise approach to listwise approach
This paper proposes a method for learning to rank, which is a key task in information retrieval. -
TREC Dynamic Domain 2015 ad-hoc retrieval task
The dataset used in the paper is the TREC Dynamic Domain 2015 ad-hoc retrieval task, which includes search result diversification. The dataset consists of 23 official runs and... -
TREC Web Track 2014 ad-hoc retrieval task
The dataset used in the paper is the TREC Web Track 2014 ad-hoc retrieval task, which includes search result diversification. The dataset consists of 50 test topics and 10,000... -
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is... -
SERP dataset
The dataset used in the paper is a collection of search engine result pages (SERPs) with their corresponding relevance scores. -
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...