-
PubMed abstracts and PubMed Central (PMC) full-text articles dataset
The PubMed abstracts and PubMed Central (PMC) full-text articles dataset is used for pretraining the UBERT variants. -
BIOPAK FLASHER: EPIDEMIC DISEASE MONITORING AND DETECTION IN PAKISTAN USING T...
The dataset used in the paper is a collection of Urdu news articles related to epidemic diseases in Pakistan. The dataset is used to train a text mining model to extract... -
DBLP papers
The dataset used in this paper is a collection of papers from the DBLP conferences between 2004 and 2014. -
NIPS papers
The dataset used in this paper is a collection of papers from the NIPS conferences between 1987 and 1999. -
Big data and big values: When companies need to rethink themselves
The dataset contains more than 94,000 tweets related to the core values of the firms listed in Fortune’s ranking of the World’s Most Admired Companies (2013-2017). -
Mining and summarizing customer reviews
Mining and summarizing customer reviews -
The Online Pivot: Lessons Learned from Teaching a Text and Data Mining Course...
A text and data mining course on Natural Language Processing, adapted for online teaching during the COVID-19 pandemic. -
Russian Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the Russian Web corpus. -
Spanish Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the Spanish Gigaword corpus. -
English Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the British National Corpus (BNC) and the English Gigaword corpus.