Offensive Language Identification Dataset (OLID)

The Offensive Language Identification Dataset (OLID) is a large collection of English tweets annotated for offensive language use, following a three-level hierarchical schema that considers whether a message is offensive or not, what is the type of the offensive message, and who is the target of the offensive message.

Data and Resources

Cite this as

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar (2024). Dataset: Offensive Language Identification Dataset (OLID). https://doi.org/10.57702/89etg4lf

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.1903.08983
Author Marcos Zampieri
More Authors
Shervin Malmasi
Preslav Nakov
Sara Rosenthal
Noura Farra
Ritesh Kumar
Homepage https://scholar.harvard.edu/malmasi/olid