You're currently viewing an old version of this dataset. To see the current version, click here.

OSCAR corpus

The dataset used in this study is the OSCAR corpus, which is a multilingual corpus that is obtained by filtering of the Common Crawl corpus.

Data and Resources

This dataset has no data

Cagri Toraman, Eyup Halit Yilmaz, Furkan Şahinuç, Oguzan Ozcelik (2024). Dataset: OSCAR corpus. https://doi.org/10.57702/o8r3jeh0

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Author	Cagri Toraman
More Authors	Eyup Halit Yilmaz Furkan Şahinuç Oguzan Ozcelik
Homepage	https://huggingface.co/datasets/oscar