Information Retrieval - Groups

WikipassageQA, InsuranceQA v2, and MS-MARCO

The dataset contains three passage-ranking datasets: WikipassageQA, InsuranceQA v2, and MS-MARCO.

Dataset
JSON

Web2Text: Deep Structured Boilerplate Removal

Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is...

Dataset
JSON

BioASQ

The BioASQ dataset contains questions and answers from various sources, including Wikipedia and biomedical literature.

Dataset
JSON

BEIR

The BEIR dataset is a large-scale zero-shot evaluation dataset for information retrieval models, consisting of 13,000 documents and 1,000 questions.

Dataset
JSON

Wikipedia dataset

The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...

Dataset
JSON

5 datasets found

WikipassageQA, InsuranceQA v2, and MS-MARCO

Web2Text: Deep Structured Boilerplate Removal

BioASQ

BEIR

Wikipedia dataset