You're currently viewing an old version of this dataset. To see the current version, click here.

OSCAR corpus

The dataset used in this study is the OSCAR corpus, which is a multilingual corpus that is obtained by filtering of the Common Crawl corpus.

Data and Resources

This dataset has no data

Cite this as

Cagri Toraman, Eyup Halit Yilmaz, Furkan Şahinuç, Oguzan Ozcelik (2024). Dataset: OSCAR corpus. https://doi.org/10.57702/o8r3jeh0

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Author Cagri Toraman
More Authors
Eyup Halit Yilmaz
Furkan Şahinuç
Oguzan Ozcelik
Homepage https://huggingface.co/datasets/oscar