Multilingual Corpus - Groups

Reuters Corpus Volume 2

A multilingual corpus with a collection of 487,000 news stories.

Dataset
JSON

OSCAR corpus

The dataset used in this study is the OSCAR corpus, which is a multilingual corpus that is obtained by filtering of the Common Crawl corpus.

Dataset
JSON

Parallel Meaning Bank

A semantically annotated parallel corpus for English, German, Italian, and Dutch where sentences are aligned with scoped meaning representations in order to capture the...

Dataset
JSON

3 datasets found

Reuters Corpus Volume 2

OSCAR corpus

Parallel Meaning Bank