-
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11...
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages -
MediSys dataset
The dataset was created by collecting metadata from the MediSys system and constructing parallel corpora from Covid-19 news. -
OPUS EMEA Corpus
The dataset was created by collecting an updated version of the European Medicines Agency (EMEA) corpus and applying new methods for text extraction from pdf files, sentence... -
EMEA corpus
The dataset was created by collecting an initial collection of parallel corpora in health and medicine domains from well-known web sources and enriching them with identified...