28 datasets found

Tags: Information Retrieval

Filter Results
  • MathMLBen

    The MathMLBen dataset is used to evaluate the performance of formula embedding techniques for mathematical information retrieval.
  • arXMLiv 2018

    The arXMLiv 2018 dataset is an HTML collection of the arXiv.org preprint archive, used as a training corpus for word embedding techniques.
  • COVID-19 Vaccination Search Insights

    COVID-19 Vaccination Search Insights dataset is a collection of anonymized search queries and their corresponding labels, which indicate whether the query is related to COVID-19...
  • TREC Deep Learning 2021 Collection

    The TREC Deep Learning 2021 collection is a test collection for information retrieval evaluation, adopting a shallow pooling approach.
  • TREC-8 Ad Hoc Collection

    The TREC-8 ad hoc collection is a test collection for information retrieval evaluation, known for its high-quality pool.
  • Concept Embedding for Information Retrieval

    Conceptual indexing includes the process of annotating raw text by concepts of a particular knowledge source. It is used to represent the content of documents and queries by...
  • CORD-19

    The CORD-19 dataset contains academic journal articles relating to a variety of coronaviruses and related viral infections, not only COVID-19, sourced from PubMed Central (PMC),...
  • COVID-19 Information Retrieval and Extraction

    The dataset used for COVID-19 information retrieval and extraction
  • BEIR

    The BEIR dataset is a large-scale zero-shot evaluation dataset for information retrieval models, consisting of 13,000 documents and 1,000 questions.
  • TREC 2019 and TREC 2020 Deep Learning Track datasets

    TREC 2019 and TREC 2020 Deep Learning Track datasets
  • Wikipedia dataset

    The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...
  • Baidu Search Dataset

    The Baidu search dataset is a large-scale search dataset for unbiased learning to rank.
  • ULTRE-2 Task

    The ULTRE-2 task encourages participants to explore ULTR approaches to alleviate various types of biases in real user clicks during training, and achieve better ranking...
  • TMC

    The TMC dataset is a collection of air traffic reports.
  • Reuters21578

    The problem of similarity search is to find the most similar items in a large collection to a query item of interest. Fast similarity search is at the core of many information...
  • TripClick

    The TripClick dataset is a large-scale benchmark for information retrieval.
  • CLEF 2003

    The dataset used for the experiments in the paper.
  • Tetun Test Collection

    The Tetun test collection is a document-level audited dataset for relevance judgments.
  • Labadain-30k+

    The Labadain-30k+ dataset is a monolingual Tetun document-level audited dataset.
  • Reuters-21578

    Text classification problem has long been an interesting research field, the aim of text classification is to develop algorithm to find the categories of given documents.