Optical Character Recognition - Groups

OCR4MT

OCR4MT is a benchmark for OCR systems on low-resource languages and scripts.
- Dataset
- JSON
Character Recognition Dataset

A dataset of 700 characters with different font faces used for testing the proposed character recognition method.
- Dataset
- JSON
ICDAR 2019 Competition on Scanned Receipt OCR and Information Extraction

A dataset for scanned receipt OCR and information extraction, focusing on key information detection and OCR tasks.
- Dataset
- JSON
CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset

A comprehensive dataset for post-OCR parsing and receipt understanding, specifically designed to enhance OCR and information extraction from receipts in multilingual contexts...
- Dataset
- JSON
ICDAR-2015 Robust Reading Competition

The ICDAR-2015 Robust Reading Competition dataset contains images with text in various fonts, sizes, and orientations.
- Dataset
- JSON
ICDAR 2013

ICDAR 2013 consists of 229 training images and 233 testing images, and similar to ICDAR 2015, it also provides "Strong", "Weak" and "Generic" lexicons for text spotting task....
- Dataset
- JSON
SPAN: a Simple Predict & Align Network for Handwritten Paragraph Recognition

The proposed model performs OCR at paragraph level, without any prior segmentation stage.
- Dataset
- JSON
ICDAR 2013 Robust Reading Competition

The ICDAR 2013 robust reading competition dataset.
- Dataset
- JSON
Responsa Project dataset

The Responsa Project dataset consists of more than 3M annotated letters from the Responsa Project dataset.
- Dataset
- JSON
OCR dataset

The OCR dataset is a dataset of handwritten digits, each image is an 8x16 binary image, and there are 52152 samples in total.
- Dataset
- JSON
Bengali word segmentation

Bengali handwritten word segmentation dataset
- Dataset
- JSON
Bengali OCR

Bengali handwritten character recognition dataset
- Dataset
- JSON
BanglaWritting

Bengali handwritten word images dataset
- Dataset
- JSON
EMNIST

Binary images are simple — two possible pixel-valued signals with a single channel. The simplicity of binary images has a significant advantage compared to colored and gray...
- Dataset
- JSON
ICDAR dataset

The ICDAR dataset is a dataset of handwritten digits.
- Dataset
- JSON
Realistic Expiry Date Dataset

The dataset consists of 3000 samples of realistic dates dataset covering the years 2019 to 2027, used for testing the model.
- Dataset
- JSON
Synthetic Expiry Date Dataset

The dataset consists of 60,000 samples of unrealistic expiry dates with the corresponding filled-in expiry dates that incorporates more samples for training the model.
- Dataset
- JSON

17 datasets found