Common Voice Spoken Sentence Similarity

The Common Voice Spoken Sentence Similarity dataset was created based on the test set of the English subset of Common Voice. To get the similarity of every pair of sentences in the original test set, two pretrained SimCSE models, that is, sup-simcse-roberta-large and sup-simcse-bert-large-uncased3 were used.

Data and Resources

Cite this as

Jian ZhuB, Zuoyu TianX, Yadong Liu, Cong ZhangQ, Chia-wen LoM (2024). Dataset: Common Voice Spoken Sentence Similarity. https://doi.org/10.57702/jjqj61w1

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2210.12857
Author Jian ZhuB
More Authors
Zuoyu TianX
Yadong Liu
Cong ZhangQ
Chia-wen LoM
Homepage https://github.com/google/