No Organization - Organizations

TruthfulQA

The TruthfulQA dataset is a dataset that contains 817 questions designed to evaluate language models' preference to mimic some human falsehoods.

Dataset
JSON

WIMCOR: A Large Harvested Corpus of Location Metonymy

WIMCOR is a large and rich dataset of location metonymy, extracted using Wikipedia. It is suitable for metonymy detection and entity linking tasks.

Dataset
JSON

Extracting Blockchain Concepts from Text

The dataset is used to extract information from whitepapers and academic articles focused on the blockchain area to organize this information and aid users to navigate the space.

Dataset
JSON

ACL Anthology Dataset

The ACL Anthology dataset contains 21,212 papers, 17,792 authors, 342 venues, and 110,975 citations.

Dataset
JSON

ACL Anthology

The ACL Anthology dataset contains papers on natural language processing, including citation patterns, authorship, and language use over time.

Dataset
JSON

Collecting and Characterizing Natural Language Utterances for Specifying Data...

A dataset of natural language utterances for specifying data visualizations.

Dataset
JSON

Semantic Profiling of Natural Language Utterances for Data Visualization Gene...

A dataset of 500 natural language utterances for data visualization generation, including utterances with uncertainties and missing data references.

Dataset
JSON

NL4Opt Generation Dataset

The NL4Opt Generation Dataset consists of 1101 examples, divided into the train, dev, and test splits composed of 713, 99, and 289 examples, respectively. Each example consists...

Dataset
JSON

Zh-En Multi-Domain Dataset

The Zh-En multi-domain dataset consists of four balanced domains: news, patent, subtitles, and COVID-19.

Dataset
JSON

XSUM Dataset

The XSUM dataset comprises 226,711 British Broadcasting Corporation (BBC) articles paired with their single-sentence summaries.

Dataset
JSON

Detecting Hallucinated Content in Conditional Neural Sequence Generation

Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the...