Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 1 dataset found Groups: Multilingual Corpora Filter Results OSCAR The OSCAR corpus is a multilingual web corpus that is used for pre-training large generative language models. It is a document-oriented corpus that is comparable in size and... Dataset JSON