Dataset - LDM

MSR

The MSR dataset is a widely used vulnerability detection dataset, consisting of 10,900 vulnerable examples and 177,736 non-vulnerable examples.
- Dataset
- JSON
Vakyansh

The dataset is used for training and testing the proposed punctuation restoration and inverse text normalization models.
- Dataset
- JSON
Samanantar dataset

The Samanantar dataset containing 49.6 million sentence pairs between English and 11 Indian languages.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

3 datasets found