31 datasets found

Tags: Wikipedia

Filter Results
  • Wikipedia dataset

    The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...
  • DocRED

    DocRED is a large-scale human-annotated dataset for document-level RE, which is constructed from Wikipedia and Wikidata.
  • CMUDoG

    CMUDoG is a knowledge grounded conversation dataset with two speakers conversing based on movie Wikipedia articles.
  • Natural Questions

    The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer.
  • ORES

    The ORES dataset is a machine learning-based web service for Wikimedia projects such as Wikipedia. It provides a model for detecting damaging edits.
  • Wizard of Wikipedia

    Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from...
  • Text8

    Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.
  • Validation Dataset

    The Validation Dataset is used for validation, it contains 1428 images from nine distinct rooms.
  • Wikipedia Comparable Corpora

    Multilingual dataset for topic modeling based on aligned Wikipedia articles extracted from Wikipedia Comparable Corpora
  • Wiki

    A bipartite interaction graph that contains the edits on Wikipedia pages over a month.
  • fr-wiki

    The fr-wiki dataset is a Wikipedia dataset for French, containing 0.5GT.
You can also access this registry using the API (see API Docs).