-
Tetun Test Collection
The Tetun test collection is a document-level audited dataset for relevance judgments. -
Labadain-30k+
The Labadain-30k+ dataset is a monolingual Tetun document-level audited dataset. -
Synthetic Dataset
The dataset used in this work is a custom synthetic dataset generated using the liquid-dsp library, containing 600000 examples of each of 13.8 million examples, with SNRs...