Text Mining - Groups

PubMed abstracts and PubMed Central (PMC) full-text articles dataset

The PubMed abstracts and PubMed Central (PMC) full-text articles dataset is used for pretraining the UBERT variants.

Dataset
JSON

BIOPAK FLASHER: EPIDEMIC DISEASE MONITORING AND DETECTION IN PAKISTAN USING T...

The dataset used in the paper is a collection of Urdu news articles related to epidemic diseases in Pakistan. The dataset is used to train a text mining model to extract...

Dataset
JSON

DBLP papers

The dataset used in this paper is a collection of papers from the DBLP conferences between 2004 and 2014.

Dataset
JSON

NIPS papers

The dataset used in this paper is a collection of papers from the NIPS conferences between 1987 and 1999.

Dataset
JSON

iLCM

The iLCM project pursues the development of an integrated research environment for the analysis of structured and unstructured data in a “Software as a Service” architecture...

Dataset
JSON

Mining and summarizing customer reviews

Dataset
JSON

Russian Noun Dataset

The dataset used for clustering contains the 2000 most frequent nouns in the Russian Web corpus.

Dataset
JSON

Spanish Noun Dataset

The dataset used for clustering contains the 2000 most frequent nouns in the Spanish Gigaword corpus.

Dataset
JSON

English Noun Dataset

The dataset used for clustering contains the 2000 most frequent nouns in the British National Corpus (BNC) and the English Gigaword corpus.

Dataset
JSON

CSL

The CSL dataset is a large-scale Chinese scientific literature dataset obtained from the "Qianyan" open-source NLP platform. It consists of 396,209 Chinese core journal papers'...

Dataset
JSON

10 datasets found