Tweet2Vec: Character-Based Distributed Representations for Social Media

Text from social media provides a set of challenges that can cause traditional NLP approaches to fail. Informal language, spelling errors, abbreviations, and special characters are all commonplace in these posts, leading to a prohibitively large vocabulary size for word-level approaches.

Data and Resources

Cite this as

Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, William W. Cohen (2025). Dataset: Tweet2Vec: Character-Based Distributed Representations for Social Media. https://doi.org/10.57702/dawrt8a5

DOI retrieved: January 3, 2025

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.1605.03481
Author Bhuwan Dhingra
More Authors
Zhong Zhou
Dylan Fitzpatrick
Michael Muehl
William W. Cohen
Homepage https://github.com/bdhingra/tweet2vec