77 datasets found

Groups: Information Retrieval

Filter Results
  • CORD-19

    The CORD-19 dataset contains academic journal articles relating to a variety of coronaviruses and related viral infections, not only COVID-19, sourced from PubMed Central (PMC),...
  • COVID-19 Information Retrieval and Extraction

    The dataset used for COVID-19 information retrieval and extraction
  • BEIR

    The BEIR dataset is a large-scale zero-shot evaluation dataset for information retrieval models, consisting of 13,000 documents and 1,000 questions.
  • TREC 2019 and TREC 2020 Deep Learning Track datasets

    TREC 2019 and TREC 2020 Deep Learning Track datasets
  • MS MARCO and DL-Typo

    Two datasets used in the paper: MS MARCO and DL-Typo.
  • SERP dataset

    The dataset used in the paper is a collection of search engine result pages (SERPs) with their corresponding relevance scores.
  • Wikipedia Corpus

    The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...
  • Wikipedia dataset

    The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...
  • Baidu Search Dataset

    The Baidu search dataset is a large-scale search dataset for unbiased learning to rank.
  • ULTRE-2 Task

    The ULTRE-2 task encourages participants to explore ULTR approaches to alleviate various types of biases in real user clicks during training, and achieve better ranking...
  • TMC

    The TMC dataset is a collection of air traffic reports.
  • Reuters21578

    The problem of similarity search is to find the most similar items in a large collection to a query item of interest. Fast similarity search is at the core of many information...
  • MSMARCO

    The dataset used for training and evaluating IR systems, containing a large collection of documents and queries.
  • TripClick

    The TripClick dataset is a large-scale benchmark for information retrieval.
  • WordNet-Based Information Retrieval Using Common Hypernyms and Combined Features

    Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings improves search...
  • CLEF 2003

    The dataset used for the experiments in the paper.
  • Tetun Test Collection

    The Tetun test collection is a document-level audited dataset for relevance judgments.
  • Labadain-30k+

    The Labadain-30k+ dataset is a monolingual Tetun document-level audited dataset.
  • Reuters-21578

    Text classification problem has long been an interesting research field, the aim of text classification is to develop algorithm to find the categories of given documents.
  • TREC-COVID

    The TREC-COVID dataset is a collection of journal articles related to COVID-19 and other coronaviruses, with human annotators providing relevancy judgments at the end of each...