OSCAR corpus

The dataset used in this study is the OSCAR corpus, which is a multilingual corpus that is obtained by filtering of the Common Crawl corpus.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cagri Toraman, Eyup Halit Yilmaz, Furkan Şahinuç, Oguzan Ozcelik (2024). Dataset: OSCAR corpus. https://doi.org/10.57702/o8r3jeh0

DOI retrieved: December 16, 2024

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Author	Cagri Toraman
More Authors	Eyup Halit Yilmaz Furkan Şahinuç Oguzan Ozcelik
Homepage	https://huggingface.co/datasets/oscar