Dataset - LDM

Synthetic dataset for text recognition

The dataset used for training the text recognition model, containing over 600,000 images with text.
- Dataset
- JSON
Text Detection on Technical Drawings

The dataset used for training a text recognition approach on technical drawings.
- Dataset
- JSON
MSRA-TD500

The MSRA-TD500 dataset is a benchmark for scene text detection, containing 700 training images and 200 test images, with multi-lingual, arbitrary-oriented and long text lines.
- Dataset
- JSON
Verisimilar Image Synthesis for Detection and Recognition of Texts

The proposed scene text image synthesis technique starts with two types of inputs including “Background Images” and “Source Texts” as illustrated in column 1 and 2 in Fig. 1.
- Dataset
- JSON
K-Watermark

A benchmark for watermark text spotting from documents and an end-to-end solution for detecting watermark text patterns and recognizing the depicted text.
- Dataset
- JSON
MARIO-LAION

The MARIO-LAION dataset is a subset of the LAION-400M dataset, containing 9,194,613 high-quality text images with corresponding captions.
- Dataset
- JSON
ICDAR-2015 Robust Reading Competition

The ICDAR-2015 Robust Reading Competition dataset contains images with text in various fonts, sizes, and orientations.
- Dataset
- JSON
ICDAR-2017 Robust Reading Competition

The ICDAR-2017 Robust Reading Competition dataset contains images with text in various fonts, sizes, and orientations.
- Dataset
- JSON
SVT

SVT is a very challenging dataset collected by Wang et al. from the Google Street View.
- Dataset
- JSON
COCO-Text

The COCO-Text dataset contains text in images, with a total of 120,000 images and 1,000 text annotations.
- Dataset
- JSON
ICDAR2015

ICDAR2015 dataset consists of 1,670 images (17,548 annotated text regions) acquired using the Google Glass.
- Dataset
- JSON
ICDAR2013

ICDAR2013 dataset is obtained from the Robust Reading Challenges 2013.
- Dataset
- JSON
Total-Text

Total-Text is a dataset for word-level arbitrary-shaped English text detection, containing 1,255 images for training and 300 images for testing.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

13 datasets found