Login | Register

Reliable and Error Tolerant Handwritten Numeral Classifier by Rejecting Undesirable Samples From the Training Set

Title:

Reliable and Error Tolerant Handwritten Numeral Classifier by Rejecting Undesirable Samples From the Training Set

Yu, David (2012) Reliable and Error Tolerant Handwritten Numeral Classifier by Rejecting Undesirable Samples From the Training Set. Masters thesis, Concordia University.

[img]
Preview
Text (application/pdf)
Yu_MASc_S2013.pdf - Accepted Version
15MB

Abstract

Unconstrained handwritten numeral recognition has many applications which include: reading handwritten bank checks, extracting numbers from tax forms and sorting ZIP codes on letter mail. However, these automated systems still make mistakes and any resulting corrections can be expensive. The main focus of most research in this field is to improve recognition accuracy, whereas the reliability rate has been neglected. Our goal is to build a system that is 100% reliable (no errors) while maintaining a high recognition rate.
A very common strategy to achieve better reliability is to implement a rejection mechanism, which processes the predicted results from a classification system. A novel rejection approach that increases the reliability and accuracy of current classifications systems is proposed in this thesis.
Our thesis compares the effect on both recognition and reliability rates using the training set rejection system, post-testing rejection system and combining both models on a classifier. A two-stage rejection system is proposed to improve the reliability of a classifier. The first stage of the rejection system purifies the training set of undesirable samples. The second stage of the rejection system removes results that do not have a strong correlation with their respective class. Our experiments study the effect on both recognition rate and substitution rate by employing a structural feature and a statistical feature. The study is conducted over the popular MNIST database for easier comparison with other methods. We are using a support vector machine based classifier, as it has achieved one of the best recognition systems to date.
Lastly, a category system is created to identify the types of samples removed from the training set. The samples are categorized into six major groups: good, very slanted, thick stroke, poor, unrecognizable, and confusing pairs. The first three categories are desirable samples, and the last three are undesirable samples from the training set. The performance of the classifier shows improvement of up to 0.03% when samples belonging to one or more undesirable groups are removed.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Yu, David
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Software Engineering
Date:6 December 2012
Thesis Supervisor(s):Suen, Ching Yee
ID Code:975024
Deposited By: DAVID YU
Deposited On:19 Jun 2013 16:48
Last Modified:18 Jan 2018 17:39
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top