Movie Triples Corpus (MTC)

The Movie Triples Corpus (MTC) dataset was derived from the Movie-DiC dataset by Banchs (2012). Although this dataset spans a wide range of topics with few spelling mistakes, its small size of only about 240,000 dialogue triples makes it difficult to train a dialogue model, as pointed out by Serban et al. (2016).

Data and Resources

Cite this as

Oluwatobi Olabiyi, Alan Salimov, Anish Khazane, Erik T. Mueller (2025). Dataset: Movie Triples Corpus (MTC). https://doi.org/10.57702/yv676lps

DOI retrieved: January 2, 2025

Additional Info

Field Value
Created January 2, 2025
Last update January 2, 2025
Defined In https://doi.org/10.48550/arXiv.1805.11752
Author Oluwatobi Olabiyi
More Authors
Alan Salimov
Anish Khazane
Erik T. Mueller
Homepage https://github.com/julianser/hed-dlg-truncated