Training Dataset
URL: https://github.com/ncg-task/training-data
Training Data for the NLPContributionGraph Shared Task 11 at SemEval-2021
The repository is organized as follows:
README.md
[task-name-folder]/ # natural_language_inference, paraphrase_generation, question_answering, relation_extraction, topic_models
├── [article-counter-folder]/ # ranges between 0 to 100 since we annotated varying numbers of articles per task
│ ├── [articlename].pdf # scholarly article pdf
│ ├── [articlename]-Grobid-out.txt # plaintext output from the Grobid parser
│ ├── [articlename]-Stanza-out.txt # plaintext preprocessed output from Stanza
│ ├── sentences.txt # annotated Contribution sentences in the file
│ ├── entities.txt # annotated entities in the Contribution sentences
│ └── info-units/ # the folder containing information units in JSON format
│ │ └── research-problem.json # research problem
mandatory information unit in json format
│ │ └── model.json # model
information unit in json format; in some articles it is called approach
│ │ └── ... # there are 12 information units in all and each article may be annotated by 3 or 6
│ └── triples/ # the folder containing information unit triples one per line
│ │ └── research-problem.txt # research problem
triples (one research problem statement per line)
│ │ └── model.txt # model
triples (one statement per line)
│ │ └── ... # there are 12 information units in all and each article may be annotated by 3 or 6
│ └── ... # there are between 1 to 100 articles annotated for each task, so this repeats for the remaining annotated articles
└── ... # there are 24 tasks selected overall, so this repeats 23 more times
There are no views created for this resource yet.
Additional Information
Field | Value |
---|---|
Created | July 23, 2021 |
Last updated | July 23, 2021 |
Format | json, pdf, txt |