Penn Treebank

doi:doi:10.57702/u5gg4t6i

Penn Treebank

The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.

BibTex:

@dataset{Mitchell_P_Marcus_and_Mary_Ann_Marcinkiewicz_and_Beatrice_Santorini_2024,
    abstract = {The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.},
    author = {Mitchell P Marcus and Mary Ann Marcinkiewicz and Beatrice Santorini},
    doi = {10.57702/u5gg4t6i},
    institution = {No Organization},
    keyword = {'Corpus', 'Corpus Linguistics', 'Dependency Parsing', 'LSTM', 'Language Modeling', 'Linguistics', 'Machine Translation', 'Natural Language', 'Natural Language Processing', 'Penn Treebank', 'Regularization', 'Semantics', 'Sentiment Analysis', 'Stanford Dependencies', 'Syntax', 'Text', 'Text Analysis', 'Text Classification', 'Treebank', 'Treebank II', 'dependency parsing', 'natural language processing', 'text analysis', 'text classification'},
    month = {nov},
    publisher = {TIB},
    title = {Penn Treebank},
    url = {https://service.tib.eu/ldmservice/dataset/penn-treebank},
    year = {2024}
}