32 datasets found

Tags: Text Analysis

Filter Results
  • Enron Email Corpus

    The dataset is used to discover hierarchical relationships from unstructured observations, specifically in the setting of discovering pairwise hierarchical relations between...
  • NAVER Open Podium and NAVER Encyclopedia

    A large dataset of Korean text.
  • CiteULike

    CiteULike is a user-article dataset, where each article has a 300-dimension tf-idf vector. XING is a user-view-job dataset where each job is described by a 2738-dimension...
  • YELP

    The YELP dataset is used for language modeling.
  • News Articles Dataset

    The dataset used in this paper is a collection of news articles from an international news website, covering a time span from September 2012 to April 2014.
  • Jester

    The Jester dataset is of continuous jokes ratings from -10 to 10, containing the jokes’ texts.
  • Yahoo and Yelp corpora

    The Yahoo and Yelp corpora dataset contains 100k sentences with greater average length.
  • 20NewsGroups

    The dataset used in this paper is a collection of documents from various domains, including news, articles, and emails.
  • Penn Treebank

    The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.
  • SNLI

    The dataset used in the paper is the Stanford Natural Language Inference (SNLI) dataset, which consists of 549,367 premise-hypothesis pairs for train/dev/test sets and target...
  • DailyDialog

    The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each.
  • Customer Service Calls Dataset

    A dataset consisting of ten years of customer service calls to a fleet truck company.
You can also access this registry using the API (see API Docs).