Training Dataset

doi:doi:10.57702/lv05pgzd

Training Dataset

The training dataset is a collection of the publicly available Arabic corpora listed below: The unshufﬂed OSCAR corpus (Ortiz Su´arez et al., 2020). The Arabic Wikipedia dump from September 2020. The 1.5B words Arabic Corpus (El-Khair, 2016).

BibTex:

@dataset{Juan_C_Montoya_and_Yinsheng_Li_and_Charles_Strother_and_Guang-Hong_Chen_2024,
    abstract = {The training dataset is a collection of the publicly available Arabic corpora listed below: The unshufﬂed OSCAR corpus (Ortiz Su´arez et al., 2020). The Arabic Wikipedia dump from September 2020. The 1.5B words Arabic Corpus (El-Khair, 2016).},
    author = {Juan C. Montoya and Yinsheng Li and Charles Strother and Guang-Hong Chen},
    doi = {10.57702/lv05pgzd},
    institution = {No Organization},
    keyword = {' imitation learning', 'Arabic Language', 'Corpus', 'Deep learning', 'INFT', 'Markov Decision Processes', 'Medical imaging', 'NFT', 'Natural Language Processing', 'Optical Pulses', 'SPOT-5 satellite', 'Training Dataset', 'Training dataset', 'bug report', 'dataset', 'deep learning', 'frequency control', 'machine learning', 'neural networks', 'ship detection', 'state-action pairs', 'training dataset'},
    month = {dec},
    publisher = {TIB},
    title = {Training Dataset},
    url = {https://service.tib.eu/ldmservice/dataset/training-dataset},
    year = {2024}
}