-
Sumerian Cuneiform Dataset
The dataset used for the study of Sumerian cuneiform, including part-of-speech tagging, named entity recognition, and machine translation. -
UK-PODS-ALIGN
This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset UK-PODS-ALIGN is a dataset that features modern conversational... -
Ligurian Monolingual Corpus
The first open source monolingual corpus for Ligurian. -
Normalized Ligurian Corpus
A dataset of 4,394 Ligurian sentences in different spelling systems paired with normalized versions.