Skip to content
Log in
Register
Toggle navigation
Datasets
All
Imported
Services
Organizations
Groups
About
Demo
FedORKG
Search Datasets
Home
Datasets
Order by
Relevance
Name Ascending
Name Descending
Last Modified
Go
2 datasets found
Tags:
OSCAR
Filter Results
OSCAR corpus
The dataset used in this study is the OSCAR corpus, which is a multilingual corpus that is obtained by filtering of the Common Crawl corpus.
Dataset
JSON
OSCAR
The OSCAR corpus is a multilingual web corpus that is used for pre-training large generative language models. It is a document-oriented corpus that is comparable in size and...
Dataset
JSON
You can also access this registry using the
API
(see
API Docs
).
Before browse our site, please accept our
cookies policy
Accept and close this alert