-
Arabic Names Transiterated in Hebrew
The dataset used for training the Arabic names transliteration model, containing 2,000 Arabic names transliterated in Hebrew. -
Arabic Names Transiterated in English
The dataset used for training the Arabic names transliteration model, containing 3,600 Arabic names transliterated in English. -
Phoneme Confusion Dataset
A dataset of phoneme confusions mined from transliteration models, used to train a generative model for phonetic misspellings. -
NEWS 2010 English-Hindi test set
The NEWS 2010 English-Hindi test set is used for transliteration equivalence evaluation. -
NEWS 2009 English-Hindi training set
The NEWS 2009 English-Hindi training set is used for transliteration equivalence learning.