Information Retrieval - Groups

WikipassageQA, InsuranceQA v2, and MS-MARCO

The dataset contains three passage-ranking datasets: WikipassageQA, InsuranceQA v2, and MS-MARCO.

Dataset
JSON

PASSAGE RANKING WITH WEAK SUPERVISION

In this paper, we propose a weak supervision framework for neural ranking tasks based on the data programming paradigm (Ratner et al., 2016), which enables us to leverage...

Dataset
JSON

Web2Text: Deep Structured Boilerplate Removal

Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is...

Dataset
JSON

Wikipedia Corpus

The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...

Dataset
JSON

Wikipedia dataset

The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...

Dataset
JSON

GLOW : Global Weighted Self-Attention Network for Web Search

GLOW is a novel Global Weighted Self-Attention Network for web document search. It leverages global corpus statistics into the deep matching model.

Dataset
JSON

Krapivin

The dataset used in the paper for keyphrase generation with correlation constraints.

Dataset
JSON

NUS

The dataset used in the paper for keyphrase generation with correlation constraints.

Dataset
JSON

Inspec

Keyphrase generation dataset for scientific articles

Dataset
JSON

KP20k

The dataset used in the paper for keyphrase generation with correlation constraints.

Dataset
JSON

KPEVAL: Towards Fine-Grained Semantic-Based Keyphrase Evaluation

A comprehensive evaluation framework for keyphrase systems, including reference agreement, faithfulness, diversity, and utility.

Dataset
JSON

11 datasets found