OSCAR corpus

The dataset used in this study is the OSCAR corpus, which is a multilingual corpus that is obtained by filtering of the Common Crawl corpus.

Data and Resources

Cite this as

Cagri Toraman, Eyup Halit Yilmaz, Furkan Şahinuç, Oguzan Ozcelik (2024). Dataset: OSCAR corpus. https://doi.org/10.57702/o8r3jeh0

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Author Cagri Toraman
More Authors
Eyup Halit Yilmaz
Furkan Şahinuç
Oguzan Ozcelik
Homepage https://huggingface.co/datasets/oscar