Dataset of Urduud1k from Natural Scenes

  • U. ZAKI
  • D. N. HAKRO
  • M. MEMON
  • F. H. KHOSO
  • M. A. ZAKI
  • G. NABI
Keywords: Urdu in Natural Scene; Dataset of Urdu Text; Urdu OCR; Detection and Recognition of Urdu


In latest years research has drawn attention on text analysis in natural scenes. Databases play a significant part in the efficiency assessment of the algorithm for text recognition. A data set of natural scene text images in six distinct languages have recently been released in an International Conference on Document Analysis and Recognition (ICDAR).This dataset is for multi-languages except Urdu. In the natural images of the Urdu scene, there is an absence of a conventional Urdu text database. This research therefore mainly aims to build a database for Urdu text in natural scenes. The dataset is very large because there are 10 distinct cameras with distinct resolution, distinct angles and distinct range requirements for each picture captured by distinct light zone. The dataset comprises of Urdu words, ligatures and characters in natural scenes. The dataset contains 16k images of words, 32k ligatures and characters images. This dataset contains 1kimagesincluding signboard, a name of the store, banners and so on. In addition, the Urdu dataset is contrasted with the current data set including ICRAR 2003, ARASTI, Chars 74k, etc. The dataset includes many images from the natural scene so it can be used in natural environments to identify Urdu text.