Text Simplification - Groups

INFOLOSSQA

INFOLOSSQA is a dataset for characterizing and recovering simplification-induced information loss in form of question-and-answer (QA) pairs.

Dataset
JSON

WebSplit

The WebSplit dataset is a benchmark for the Split and Rephrase task, consisting of RDF semantic tuples.

Dataset
JSON

Split and Rephrase Benchmark (SRB) and WikiSplit

The dataset used in the paper is the Split and Rephrase Benchmark (SRB) and WikiSplit, which are used to evaluate the readability of sentence splitting.

Dataset
JSON

Text Simplification Datasets: Exploration

Text Simplification datasets have limitations and need to be improved to build more robust models.

Dataset
JSON

A New Aligned Simple German Corpus

A new sentence-aligned monolingual corpus for Simple German – German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment...

Dataset
JSON

WikiSplit

A large dataset of naturally occurring sentence rewrites from Wikipedia edit history, providing sixty times more distinct split examples and a ninety times larger vocabulary...

Dataset
JSON

Newsela

The dataset is used to evaluate the proposed discourse-aware text simplification approach.

Dataset
JSON

WikiLarge

The dataset is used to evaluate the proposed discourse-aware text simplification approach.

Dataset
JSON

Wiki-Auto

The Wiki-Auto dataset is a text simplification dataset.

Dataset
JSON

9 datasets found