-
BioConceptVec Evaluation Datasets
The dataset contains over 25 million instances from nine independent datasets used for intrinsic and extrinsic evaluations. -
PubMed abstracts and PubMed Central (PMC) full-text articles dataset
The PubMed abstracts and PubMed Central (PMC) full-text articles dataset is used for pretraining the UBERT variants. -
BIOPAK FLASHER: EPIDEMIC DISEASE MONITORING AND DETECTION IN PAKISTAN USING T...
The dataset used in the paper is a collection of Urdu news articles related to epidemic diseases in Pakistan. The dataset is used to train a text mining model to extract... -
DBLP papers
The dataset used in this paper is a collection of papers from the DBLP conferences between 2004 and 2014. -
NIPS papers
The dataset used in this paper is a collection of papers from the NIPS conferences between 1987 and 1999. -
Stack Overflow Performance Discussions
The dataset used for the study, containing 2,304 posts related to performance of software components -
MSR Mining Challenge 2015
The dataset used for the MSR Mining Challenge in 2015 containing 43,336,603 posts -
Mining and summarizing customer reviews
Mining and summarizing customer reviews -
Music Corpus
The dataset used for term clustering to build a modular ontology according to core ontology from domain-specific text. -
Russian Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the Russian Web corpus. -
Spanish Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the Spanish Gigaword corpus. -
English Noun Dataset
The dataset used for clustering contains the 2000 most frequent nouns in the British National Corpus (BNC) and the English Gigaword corpus. -
Turkish Tweets Dataset
A collection of Turkish tweets about three different Turkish telecommunication brands gathered over one month.