Dataset - LDM

CMID, KUAKE-QIC, Intent-Merged

Biomedical intent detection and named entity recognition datasets
- Dataset
- JSON
JNLPBA, DDI, BC5CDR, NCBI-Disease, AnatEM

Biomedical intent detection and named entity recognition datasets
- Dataset
- JSON
BC5CDR

The BC5CDR dataset consists of 1,500 PubMed articles, which has been separated into training set (500), development set (500), and test set (500). The dataset contains 15,935...
- Dataset
- JSON
ProtST

The ProtST dataset is a collection of protein sequences and their corresponding biomedical text descriptions.
- Dataset
- JSON
OSCAR

The OSCAR corpus is a multilingual web corpus that is used for pre-training large generative language models. It is a document-oriented corpus that is comparable in size and...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

5 datasets found