Arabic Language - Groups

ARAGPT2

ARAGPT2 is a stacked transformer-decoder model trained using the causal language modeling objective. The model is trained on 77GB of Arabic text.

Dataset
JSON

Validation Dataset

The Validation Dataset is used for validation, it contains 1428 images from nine distinct rooms.

Dataset
JSON

Training Dataset

The training dataset is a collection of the publicly available Arabic corpora listed below: The unshufﬂed OSCAR corpus (Ortiz Su´arez et al., 2020). The Arabic Wikipedia dump...

Dataset
JSON

3 datasets found

ARAGPT2

Validation Dataset

Training Dataset