Umsuka English-isiZulu Parallel Corpus

The Umsuka English-isiZulu Parallel Corpus provides a novel, high-quality parallel dataset for machine translation, containing English sentences sampled from both News Crawl datasets which were then translated into isiZulu, and isiZulu sentences from the NCHLT monolingual corpus and UKZN isiZulu National monolingual corpus, which were then translated into English.

Data and Resources

Cite this as

Muhammad Umair Nasir, Innocent Amos Mchechesi (2024). Dataset: Umsuka English-isiZulu Parallel Corpus. https://doi.org/10.57702/7nbjbh6c

DOI retrieved: December 17, 2024

Additional Info

Field Value
Created December 17, 2024
Last update December 17, 2024
Defined In https://doi.org/10.48550/arXiv.2205.08621
Author Muhammad Umair Nasir
More Authors
Innocent Amos Mchechesi
Homepage https://zenodo.org/record/5035171#