English Gigaword Corpus

You're currently viewing an old version of this dataset. To see the current version, click here.

The English monolingual corpus used to create synthetic data for training models by back-translation.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Meng Sun, Bojian Jiang, Hao Xiong, Zhongjun He, Hua Wu, Haifeng Wang (2024). Dataset: English Gigaword Corpus. https://doi.org/10.57702/fn3hvzev

DOI retrieved: November 25, 2024

Field	Value
Created	November 25, 2024
Last update	November 25, 2024
Defined In	https://doi.org/10.18653/v1/W19-5341
Author	Meng Sun
More Authors	Bojian Jiang Hao Xiong Zhongjun He Hua Wu Haifeng Wang