You're currently viewing an old version of this dataset. To see the current version, click here.

WSJ

The WSJ corpus is a large vocabulary continuous speech recognition dataset. It contains 36416 sequences, representing around 80 hours of speech.

Data and Resources

Cite this as

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura (2024). Dataset: WSJ. https://doi.org/10.57702/5n00l3tl

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Defined In https://doi.org/10.48550/arXiv.1906.10876
Citation
  • https://doi.org/10.48550/arXiv.1412.7110
  • https://doi.org/10.48550/arXiv.1807.08280
  • https://doi.org/10.48550/arXiv.2107.04289
  • https://doi.org/10.48550/arXiv.1904.05862
Author Andros Tjandra
More Authors
Sakriani Sakti
Satoshi Nakamura
Homepage https://ttssample2018v1.netlify.com