-
BookSum dataset
The dataset contains documents from the literature domain. -
Camera-Captured Characters and Words images (C3Wi) Dataset
A novel dataset of camera-captured character and word images. -
Automatic Ground Truth Generation of Camera-captured Document Images
A novel, generic method for automatic ground truth generation of camera-captured document images. -
UIT-MLReceipts: A Multilingual Benchmark for Detecting and Recognizing Key In...
A multilingual benchmark for detecting and recognizing key information in receipts. -
ICDAR 2019 Competition on Scanned Receipt OCR and Information Extraction
A dataset for scanned receipt OCR and information extraction, focusing on key information detection and OCR tasks. -
CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset
A comprehensive dataset for post-OCR parsing and receipt understanding, specifically designed to enhance OCR and information extraction from receipts in multilingual contexts... -
Kleister NDA and Kleister Charity
A multi-page document dataset used for watermark text spotting. -
Historical-WI
The dataset is used for writer identification and retrieval. It contains 3600 document images written by 720 different writers. -
CLAMM16 and CLAMM17
The dataset used for writer identiƓcation and script type classiƓcation.