-
Enron Email Corpus
The dataset is used to discover hierarchical relationships from unstructured observations, specifically in the setting of discovering pairwise hierarchical relations between... -
NAVER Open Podium and NAVER Encyclopedia
A large dataset of Korean text. -
News Articles Dataset
The dataset used in this paper is a collection of news articles from an international news website, covering a time span from September 2012 to April 2014. -
Yahoo and Yelp corpora
The Yahoo and Yelp corpora dataset contains 100k sentences with greater average length. -
20NewsGroups
The dataset used in this paper is a collection of documents from various domains, including news, articles, and emails. -
Penn Treebank
The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths. -
DailyDialog
The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each. -
Customer Service Calls Dataset
A dataset consisting of ten years of customer service calls to a fleet truck company.