-
Instruct-PTBR
The dataset used for training the TeenyTinyLlama pair consists of a concatenation of open-source Brazilian Portuguese datasets, including Wikipedia, CulturaX, OSCAR, Common... -
Pt-Corpus-Instruct
The dataset used for training the TeenyTinyLlama pair consists of a concatenation of open-source Brazilian Portuguese datasets, including Wikipedia, CulturaX, OSCAR, Common... -
PropBank.Br
The PropBank.Br corpus is a corpus of Brazilian Portuguese texts annotated with semantic roles. -
COCO dataset (Brazilian Portuguese)
The dataset used for training the Brazilian Portuguese version of the GRIT model, a translation of the COCO dataset.