TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

Data and Resources

Cite this as

Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous (2024). Dataset: TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS. https://doi.org/10.57702/sgk81aae

DOI retrieved: December 3, 2024

Additional Info

Field Value
Created December 3, 2024
Last update December 3, 2024
Author Yuxuan Wang
More Authors
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Ying Xiao
Zhifeng Chen
Samy Bengio
Quoc Le
Yannis Agiomyrgiannakis
Rob Clark
Rif A. Saurous
Homepage https://google.github.io/tacotron