-
RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging
The proposed RefXVC system uses multiple reference sources to capture the tonal variations in a speaker's speech more accurately. -
Y-Vector: Multiscale Waveform Encoder for Speaker Embedding
The proposed Y-vector system is used for speaker verification and speaker embedding. -
VoxCeleb dataset
The VoxCeleb dataset is a large-scale speaker identification dataset, used to evaluate the performance of face recognition systems.