Generative Pre-Training for Speech

doi:doi:10.57702/74wogvts

Generative Pre-Training for Speech

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data. In speech, text-to-speech synthesis and neural vocoder are good examples where generative models have shined. While generative models have been applied to different applications in speech, there exists no general-purpose generative model that models speech directly. In this work, we take a step toward this direction by showing a single pre-trained generative model can be adapted to different downstream tasks with strong performance.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu (2024). Dataset: Generative Pre-Training for Speech. https://doi.org/10.57702/74wogvts

DOI retrieved: December 3, 2024

Additional Info

Field	Value
Created	December 3, 2024
Last update	December 3, 2024
Defined In	https://doi.org/10.48550/arXiv.2310.16338
Author	Alexander H. Liu
More Authors	Matt Le Apoorv Vyas Bowen Shi Andros Tjandra Wei-Ning Hsu
Homepage	https://voicebox.metademolab.com/speechflow.html