-
What did you mention? A large scale mention detection benchmark for spoken an...
A large-scale mention detection benchmark for spoken and written text. -
FEVER: A Large-Scale Dataset for Fact Extraction and Verification
The FEVER dataset consists of 185,455 annotated claims, together with 5,416,537 Wikipedia documents containing roughly 25 million sentences as potential evidence. -
TREC Deep Learning track
The TREC Deep Learning track dataset is a collection of question answering datasets, which are used for passage retrieval and ranking. -
SHACL Satisfiability and Containment
The Shapes Constraint Language (SHACL) is a recent W3C recommendation language for validating RDF data. This paper provides a translation of SHACL into a new first-order... -
Good Judgment Open
The Good Judgment Open (GJO) dataset contains 1770 datapoints (698 'forecasts' and 1072 'comments') posted by 242 anonymised users with a range of expertise. -
QQP Dataset
The QQP dataset contains more than 400k question pairs. -
Self-Recognition in Language Models
A self-recognition test for language models using model-generated security questions. -
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autono...
A driving scenario QA task and a dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question... -
Vicuna dataset
Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced... -
DOLLY dataset
Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced... -
NYT and WebNLG
NYT and WebNLG are widely used datasets for relational triple extraction. -
Repair Program
The repair program Π(D, IC) for a database instance D without nulls has the following rules: Program facts: P(¯a) for each atom P(¯a) ∈ D. For a constraint of the form... -
Repair Programs for Consistent Query Answering
Repair programs for consistent query answering have been well studied in the literature. They specify the database repairs as their stable models. On their basis, and using... -
AutoCast++: Enhancing World Event Prediction with Zero-Shot Ranking-Based Con...
The Autocast++ dataset is a benchmark for event forecasting using news articles.