Information Retrieval - Groups

LETOR

The dataset used in the paper Modeling Diverse Relevance Patterns in Ad-hoc Retrieval
- Dataset
- JSON
SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles

SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles
- Dataset
- JSON
MQ2009

The MQ2009 query set is a large-scale query dataset, containing 200,000 queries.
- Dataset
- JSON
Wikimarks

The Wikimarks dataset, which consists of 30 million deduplicated paragraphs from all Wikipedia articles.
- Dataset
- JSON
TREC-CAR Benchmark Y1

The dataset used for the Retrieve-Cluster-Summarize system, consisting of 117 article-level queries and 126 test queries.
- Dataset
- JSON
LETOR 4.0

The LETOR 4.0 dataset is a collection of information retrieval tasks.
- Dataset
- JSON
IRGAN

IRGAN is an information retrieval (IR) modeling approach that uses a theoretical minimax game between a generative and a discriminative model to iteratively optimize both of...
- Dataset
- JSON
YouTube Clickbait Detection Dataset

The dataset is a collection of online videos from YouTube, with comments and metadata. It is used to evaluate the performance of the Online Video Clickbait Protector (OVCP) scheme.
- Dataset
- JSON
PMING Distance

PMING Distance is a measure of proximity, which conveys information on relationships between two terms, e.g. word or expressions, carrying semantic meaning, used on various...
- Dataset
- JSON
A citation-based method for automatic indexing of Chinese academic literatures

The dataset used in this paper for citation-based method for automatic indexing of Chinese academic literatures.
- Dataset
- JSON
WINGNUS: Keyphrase extraction utilizing document logical structure

The dataset used in this paper for keyphrase extraction utilizing document logical structure.
- Dataset
- JSON
SemEval-2010 Task 5 dataset

The dataset used in this paper for keyphrase extraction from academic articles.
- Dataset
- JSON
NevIR

Negation in Neural Information Retrieval
- Dataset
- JSON
Leveraging Passage Embeddings for Efficient Listwise Reranking

Passage ranking, which aims to rank each passage in a large corpus according to its relevance to the user's information need expressed in a short query.
- Dataset
- JSON
BBC News dataset

The BBC News dataset was used for sentiment analysis of news articles.
- Dataset
- JSON
NIPS full paper dataset

The NIPS full paper dataset is a collection of text documents.
- Dataset
- JSON
C-MinHash

The dataset used in this paper is a binary dataset, where each data vector is a binary vector of length D.
- Dataset
- JSON
ClueWeb09B

The ClueWeb09B collection is a large-scale web search dataset, containing 31 million web pages, 31 million queries, and 1.5 billion documents.
- Dataset
- JSON
AOL Dataset

The AOL dataset contains a collection of queries and documents for search engine evaluation.
- Dataset
- JSON
TREC 2004 Robust Retrieval Track

The TREC 2004 Robust Retrieval Track dataset contains a collection of documents and queries for robust retrieval tasks.
- Dataset
- JSON

77 datasets found