Generative Pre-Training for Speech

doi:doi:10.57702/74wogvts

Generative Pre-Training for Speech

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data. In speech, text-to-speech synthesis and neural vocoder are good examples where generative models have shined. While generative models have been applied to different applications in speech, there exists no general-purpose generative model that models speech directly. In this work, we take a step toward this direction by showing a single pre-trained generative model can be adapted to different downstream tasks with strong performance.

BibTex:

@dataset{Alexander_H_Liu_and_Matt_Le_and_Apoorv_Vyas_and_Bowen_Shi_and_Andros_Tjandra_and_Wei-Ning_Hsu_2024,
    abstract = {Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data. In speech, text-to-speech synthesis and neural vocoder are good examples where generative models have shined. While generative models have been applied to different applications in speech, there exists no general-purpose generative model that models speech directly. In this work, we take a step toward this direction by showing a single pre-trained generative model can be adapted to different downstream tasks with strong performance.},
    author = {Alexander H. Liu and Matt Le and Apoorv Vyas and Bowen Shi and Andros Tjandra and Wei-Ning Hsu},
    doi = {10.57702/74wogvts},
    institution = {No Organization},
    keyword = {'Generative Models', 'Speech Enhancement', 'Speech Separation', 'Speech Synthesis'},
    month = {dec},
    publisher = {TIB},
    title = {Generative Pre-Training for Speech},
    url = {https://service.tib.eu/ldmservice/dataset/generative-pre-training-for-speech},
    year = {2024}
}