Matching Latent Encoding for Audio-Text based Keyword Spotting

The proposed end-to-end model architecture for flexible keyword spotting, consisting of encoder, projector, and audio-text aligner modules.

Data and Resources

Cite this as

Kumari Nishu, Minsik Cho, Devang Naik (2024). Dataset: Matching Latent Encoding for Audio-Text based Keyword Spotting. https://doi.org/10.57702/46jr1588

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Defined In https://doi.org/10.48550/arXiv.2306.05245
Author Kumari Nishu
More Authors
Minsik Cho
Devang Naik