Offensive Language Identiﬁcation Dataset (OLID)

doi:doi:10.57702/89etg4lf

Offensive Language Identiﬁcation Dataset (OLID)

The Offensive Language Identiﬁcation Dataset (OLID) is a large collection of English tweets annotated for offensive language use, following a three-level hierarchical schema that considers whether a message is offensive or not, what is the type of the offensive message, and who is the target of the offensive message.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar (2024). Dataset: Offensive Language Identiﬁcation Dataset (OLID). https://doi.org/10.57702/89etg4lf

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.1903.08983
Author	Marcos Zampieri
More Authors	Shervin Malmasi Preslav Nakov Sara Rosenthal Noura Farra Ritesh Kumar
Homepage	https://scholar.harvard.edu/malmasi/olid