5 datasets found

Tags: natural language processing

Filter Results
  • WikipassageQA, InsuranceQA v2, and MS-MARCO

    The dataset contains three passage-ranking datasets: WikipassageQA, InsuranceQA v2, and MS-MARCO.
  • Web2Text: Deep Structured Boilerplate Removal

    Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is...
  • BioASQ

    The BioASQ dataset contains questions and answers from various sources, including Wikipedia and biomedical literature.
  • BEIR

    The BEIR dataset is a large-scale zero-shot evaluation dataset for information retrieval models, consisting of 13,000 documents and 1,000 questions.
  • Wikipedia dataset

    The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...