Machine Translation - Groups

XTOWER

A multilingual LLM for explaining and correcting translation errors

Dataset
JSON

Vietnamese Diacritic Restoration Dataset

The dataset used for Vietnamese diacritic restoration problem, consisting of 180,000 sentence pairs.

Dataset
JSON

Machine Translation and Automated Analysis of the Sumerian Language Dataset

The Machine Translation and Automated Analysis of the Sumerian Language dataset, which contains Sumerian texts in cuneiform script.

Dataset
JSON

Covid-19 MLIA @ Eval initiative

The Covid-19 MLIA @ Eval initiative consists of three Natural Language Processing tasks: information extraction, multilingual semantic search and machine translation. The goal...

Dataset
JSON

Penn Treebank

The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.

Dataset
JSON

5 datasets found

XTOWER

Vietnamese Diacritic Restoration Dataset

Machine Translation and Automated Analysis of the Sumerian Language Dataset

Covid-19 MLIA @ Eval initiative

Penn Treebank