36 datasets found

Tags: Information Retrieval

Filter Results
  • LETOR 4.0

    The LETOR 4.0 dataset is a collection of information retrieval tasks.
  • IRGAN

    IRGAN is an information retrieval (IR) modeling approach that uses a theoretical minimax game between a generative and a discriminative model to iteratively optimize both of...
  • YouTube Clickbait Detection Dataset

    The dataset is a collection of online videos from YouTube, with comments and metadata. It is used to evaluate the performance of the Online Video Clickbait Protector (OVCP) scheme.
  • NevIR

    Negation in Neural Information Retrieval
  • ClueWeb09B

    The ClueWeb09B collection is a large-scale web search dataset, containing 31 million web pages, 31 million queries, and 1.5 billion documents.
  • AOL Dataset

    The AOL dataset contains a collection of queries and documents for search engine evaluation.
  • TREC 2004 Robust Retrieval Track

    The TREC 2004 Robust Retrieval Track dataset contains a collection of documents and queries for robust retrieval tasks.
  • MathMLBen

    The MathMLBen dataset is used to evaluate the performance of formula embedding techniques for mathematical information retrieval.
  • arXMLiv 2018

    The arXMLiv 2018 dataset is an HTML collection of the arXiv.org preprint archive, used as a training corpus for word embedding techniques.
  • COVID-19 Vaccination Search Insights

    COVID-19 Vaccination Search Insights dataset is a collection of anonymized search queries and their corresponding labels, which indicate whether the query is related to COVID-19...
  • TREC Deep Learning 2021 Collection

    The TREC Deep Learning 2021 collection is a test collection for information retrieval evaluation, adopting a shallow pooling approach.
  • TREC-8 Ad Hoc Collection

    The TREC-8 ad hoc collection is a test collection for information retrieval evaluation, known for its high-quality pool.
  • Concept Embedding for Information Retrieval

    Conceptual indexing includes the process of annotating raw text by concepts of a particular knowledge source. It is used to represent the content of documents and queries by...
  • CORD-19

    The CORD-19 dataset contains academic journal articles relating to a variety of coronaviruses and related viral infections, not only COVID-19, sourced from PubMed Central (PMC),...
  • COVID-19 Information Retrieval and Extraction

    The dataset used for COVID-19 information retrieval and extraction
  • BEIR

    The BEIR dataset is a large-scale zero-shot evaluation dataset for information retrieval models, consisting of 13,000 documents and 1,000 questions.
  • TREC 2019 and TREC 2020 Deep Learning Track datasets

    TREC 2019 and TREC 2020 Deep Learning Track datasets
  • Wikipedia dataset

    The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...
  • Baidu Search Dataset

    The Baidu search dataset is a large-scale search dataset for unbiased learning to rank.
  • ULTRE-2 Task

    The ULTRE-2 task encourages participants to explore ULTR approaches to alleviate various types of biases in real user clicks during training, and achieve better ranking...
You can also access this registry using the API (see API Docs).