Login | Register

Model-based identification of Oriental documents

Title:

Model-based identification of Oriental documents

Yacoub Said, Rita A (1999) Model-based identification of Oriental documents. Masters thesis, Concordia University.

[thumbnail of MQ43669.pdf]
Preview
Text (application/pdf)
MQ43669.pdf
3MB

Abstract

Computers with the capability of identifying languages printed in documents can support many potential applications including document classification for character recognition, translation, and language understanding. Language identification is normally done manually. However, the high volume and variety of languages encountered make manual identification impractical and an automatic language approach becomes necessary. Therefore, language identification is a key step in the automatic processing of document images. This thesis is concerned with a model-based classification of Oriental documents into Chinese, Japanese, and Korean. A model-based approach locates an object, of which the computer has a model in an image. In this work, the objects to be located are some of the most frequently appearing characters in each of the three Oriental languages, and the images to be searched for the objects are the Oriental documents fed to the system. A major part of the work is to locate instances of the character models in an Oriental document. which is done by using the Hausdorff distance, a similarity measure defined between two sets of points. One of the point sets represents a model of some Oriental character to look for, and the other represents each character in the document image to be identified. Since Oriental documents are complex in structure, a portion of the text is extracted from the input document for further processing

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Yacoub Said, Rita A
Pagination:xiv, 110 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science and Software Engineering
Date:1999
Thesis Supervisor(s):Suen, Ching Y
Identification Number:P 98 Y33 1999
ID Code:877
Deposited By: Concordia University Library
Deposited On:27 Aug 2009 17:15
Last Modified:13 Jul 2020 19:47
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top