Common Voice Spoken Sentence Similarity

doi:doi:10.57702/jjqj61w1

Common Voice Spoken Sentence Similarity

The Common Voice Spoken Sentence Similarity dataset was created based on the test set of the English subset of Common Voice. To get the similarity of every pair of sentences in the original test set, two pretrained SimCSE models, that is, sup-simcse-roberta-large and sup-simcse-bert-large-uncased3 were used.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Jian ZhuB, Zuoyu TianX, Yadong Liu, Cong ZhangQ, Chia-wen LoM (2024). Dataset: Common Voice Spoken Sentence Similarity. https://doi.org/10.57702/jjqj61w1

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2210.12857
Author	Jian ZhuB
More Authors	Zuoyu TianX Yadong Liu Cong ZhangQ Chia-wen LoM
Homepage	https://github.com/google/