Dataset - LDM

Squared Curvature Regularization

The dataset used in the paper for testing the proposed squared curvature regularization approach.
- Dataset
- JSON
Enwik8

The Enwik8 dataset is a large-scale language modeling dataset.
- Dataset
- JSON
Penn Treebank Character

The Penn Treebank Character dataset is a character-level language modeling dataset.
- Dataset
- JSON
Improved Language Modeling by Decoding the Past

Highly regularized LSTMs achieve impressive results on several benchmark datasets in language modeling. We propose a new regularization method based on decoding the last token...
- Dataset
- JSON
Reg-mixup: Mixup as a regularizer can surprisingly improve accuracy and out d...

Mixup as a regularizer can surprisingly improve accuracy and out distribution robustness.
- Dataset
- JSON
Penn Treebank

The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

6 datasets found