Our Goals

Our Motivation

The dataset came out from the realization that the field of Arabic Text Recognition especially in Ancient Manuscripts and Calligraphy paintings is overlooked.
Despite the rich history of the Arabic language and its various artistic styles, few works have dealt with the application of Machine Learning for the detection of the styles in Arabic text. However, no previous works tackled the task of text recognition or what is known as Optical Character Recognition (OCR) using Machine Learning.
Another issue we noticed is that the majority of the datasets we found dealing with style detection at least are not published and not accessible to the public.

  • ح
  • أ
  • ظ
  • لا
  • ف
  • ش
  • ك
  • ي
From those realizations, we decided to publish the first dataset that meets the following requirements:
Images Labeled with Text Inside and Style
Diverse Styles (Naskh, Diwani, Thuluth, Muhaquaq, Kufi...)
Diverse Historic Contexts (Modern-day, Abbasid, Ottoman, Persian)
That was how our dataset HICMA came into existence. Its name originates from the Arabic word
حكمة
which can be translated in English to wisdom, a nod to the rich cultural heritage of the manuscripts and paintings we are sharing.
Our goal is to empower researchers to venture into this new exciting field and accelerate the development of more powerful, accurate, and inclusive Machine Learning algorithms.