-
WikipassageQA, InsuranceQA v2, and MS-MARCO
The dataset contains three passage-ranking datasets: WikipassageQA, InsuranceQA v2, and MS-MARCO. -
PASSAGE RANKING WITH WEAK SUPERVISION
In this paper, we propose a weak supervision framework for neural ranking tasks based on the data programming paradigm (Ratner et al., 2016), which enables us to leverage... -
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is... -
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,... -
Wikipedia dataset
The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries... -
GLOW : Global Weighted Self-Attention Network for Web Search
GLOW is a novel Global Weighted Self-Attention Network for web document search. It leverages global corpus statistics into the deep matching model. -
KPEVAL: Towards Fine-Grained Semantic-Based Keyphrase Evaluation
A comprehensive evaluation framework for keyphrase systems, including reference agreement, faithfulness, diversity, and utility.