Login | Register

Automatic semantic annotation of Web documents


Automatic semantic annotation of Web documents

Vujicic, Milos (2009) Automatic semantic annotation of Web documents. Masters thesis, Concordia University.

[thumbnail of MR67266.pdf]
Text (application/pdf)
MR67266.pdf - Accepted Version


Ontologies are the most important construct of the Semantic Web. From the first attempt of using simplified RDF syntax to the advanced features of the OWL languages, ontologies have arisen as the most viable technology offering solutions to integrate various Web resources into a more intelligent Web. The work presented in this thesis is a contribution to the new generation of the Web, which should be readable and interpreted not only by humans but also by machines, such as software agents. In order to allow ontologies to achieve their role of "animating" the traditional Web into this next generation Web, it is essential to find an efficient way to map all existent Web resources onto their corresponding ontology classes. In this thesis, we propose an approach for automatic semantic annotation of Web documents which is an effective way to make the Semantic Web a reality. Such an integrated Web would greatly improve the accuracy of search engines, bring a new generation of intelligent Web services, push the limits of multi-agent technologies and improve many other areas of human activity that we cannot even imagine today. Considering the size and the speed of the growing Web, it is clear that this task cannot be achieved manually. Semi-automatic and automatic annotations of Web documents using statistical text classification methods seem to be the most promising solution. This work is focused on an approach based on Naive Bayes text classification adapted to some characteristics that are particular to Web documents. A complete software solution is developed to allow testing feasibility of such an approach. Furthermore, different variations of the text classification algorithms are tested and analysed in order to identify the most optimal approach to semantically annotate Web documents. Notably, the usage of Web documents hierarchy is explored as an option to improve the accuracy of semi-automatic and automatic annotations of Web documents. The results of each tested method are presented and commented. Finally, some aspects that could possibly be improved or approached in a different way are identified for future work.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (Masters)
Authors:Vujicic, Milos
Pagination:ix, 82 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M.A. Sc
Program:Institute for Information Systems Engineering
Thesis Supervisor(s):Bouguila, Nizar and Bentahar, Jamal
Identification Number:LE 3 C66Q35M 2009 V84
ID Code:976622
Deposited By: Concordia University Library
Deposited On:22 Jan 2013 16:29
Last Modified:13 Jul 2020 20:10
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top