-
PubMed abstracts and PubMed Central (PMC) full-text articles dataset
The PubMed abstracts and PubMed Central (PMC) full-text articles dataset is used for pretraining the UBERT variants. -
BIOPAK FLASHER: EPIDEMIC DISEASE MONITORING AND DETECTION IN PAKISTAN USING T...
The dataset used in the paper is a collection of Urdu news articles related to epidemic diseases in Pakistan. The dataset is used to train a text mining model to extract... -
DBLP papers
The dataset used in this paper is a collection of papers from the DBLP conferences between 2004 and 2014. -
NIPS papers
The dataset used in this paper is a collection of papers from the NIPS conferences between 1987 and 1999. -
Mining and summarizing customer reviews
Mining and summarizing customer reviews -
Russian Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the Russian Web corpus. -
Spanish Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the Spanish Gigaword corpus. -
English Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the British National Corpus (BNC) and the English Gigaword corpus.