-
WikipassageQA, InsuranceQA v2, and MS-MARCO
The dataset contains three passage-ranking datasets: WikipassageQA, InsuranceQA v2, and MS-MARCO. -
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is... -
Wikipedia dataset
The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...