GIGA-CM dataset

GIGA-CM is a large-scale dataset comprising millions of documents, created to facilitate the pre-training of hierarchical document encoding models for summarization tasks.

BibTex: