Al-Khayat, Muna (2014) Learning-Based Arabic Word Spotting Using a Hierarchical Classifier. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
1MBAl-Khayat_PhD_S2014.pdf - Accepted Version |
Abstract
The effective retrieval of information from scanned and written documents is becoming essential with the increasing amounts of digitized documents, and therefore developing efficient means of analyzing and recognizing these documents is of significant interest. Among these methods is word spotting, which has recently become an active research area. Such systems have been implemented for Latin-based and Chinese languages, while few of them have been implemented for Arabic handwriting. The fact that Arabic writing is cursive by nature and unconstrained, with no clear white space between words, makes the processing of Arabic handwritten documents a more challenging problem.
In this thesis, the design and implementation of a learning-based Arabic handwritten word spotting system is presented. This incorporates the aspects of text line extraction, handwritten word recognition, partial segmentation of words, word spotting and finally validation of the spotted words.
The Arabic text line is more unconstrained than that of other scripts, essentially since it also includes small connected components such as dots and diacritics that are usually located between lines. Thus, a robust method to extract text lines that takes into consideration the challenges in the Arabic handwriting is proposed. The method is evaluated on two Arabic handwritten documents databases, and the results are compared with those of two other methods for text line extraction. The results show that the proposed method is effective, and compares favorably with the other methods.
Word spotting is an automatic process to search for words within a document. Applying this process to handwritten Arabic documents is challenging due to the absence of a clear space between handwritten words. To address this problem, an effective learning-based method for Arabic handwritten word spotting is proposed and presented in this thesis. For this process, sub-words or pieces of Arabic words form the basic components of the search process, and a hierarchical classifier is implemented to integrate statistical language models with the segmentation of an Arabic text line into sub-words.
The holistic and analytical paradigms (for word recognition and spotting) are studied, and verification models based on combining these two paradigms have been proposed and implemented to refine the outcomes of the analytical classifier that spots words.
Finally, a series of evaluation and testing experiments have been conducted to evaluate the effectiveness of the proposed systems, and these show that promising results have been obtained.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Al-Khayat, Muna |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Computer Science |
Date: | 2014 |
Thesis Supervisor(s): | Suen, Ching Y. and Lam, Louisa |
ID Code: | 978057 |
Deposited By: | MUNA KHAYYAT |
Deposited On: | 16 Jun 2014 13:13 |
Last Modified: | 18 Jan 2018 17:45 |
Repository Staff Only: item control page