3 datasets found

Tags: multi-speaker

Filter Results
  • LRS3

    The LRS3 dataset is a large-scale dataset for visual speech recognition. It consists of thousands of spoken sentences from TED videos.
  • CSTR VCTK Corpus

    The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.
  • LibriTTS

    A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.
You can also access this registry using the API (see API Docs).