Login | Register

CNDROBOT : a robot for the CINDI digital library system


CNDROBOT : a robot for the CINDI digital library system

Zhou, Cong (2005) CNDROBOT : a robot for the CINDI digital library system. Masters thesis, Concordia University.

Text (application/pdf)
MR14343.pdf - Accepted Version


Web robots or crawlers are an essential component of all search engines. Major search engines such as Google and AltaVista use their own robots (GoogleBot and Mercator) to crawl and index billions of Web pages over the Internet. Web robots are also increasingly adopted by digital libraries to collect data and on-line documents. The crawling process requires massive amounts of hardware and network resources as well as time. However, when only information about a predefined topic set is desired, the use of traditional crawling strategy becomes inefficient and cost ineffective. This thesis presents issues in developing a focused crawler - CNDROBOT, which only explores well-selected domain sites and collects potential on-topic documents for the CINDI digital library. The research was concerned with the studies on various search engines, types of Web robots, and crawling strategies. The research primarily involved the design and implementation of the CNDROBOT as well as the integration of the Document Filtering Subsystem. Finally, a Web application for the CNDROT was developed and an extensive test was conducted for various components and functions of this system. This thesis demonstrates that the CNDROBOT is capable of effectively and efficiently discovering large amounts of desired documents and supplying them for the CINDI digital library

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Zhou, Cong
Pagination:viii, 127 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science and Software Engineering
Thesis Supervisor(s):Desai, Bipin
ID Code:8750
Deposited By: Concordia University Library
Deposited On:18 Aug 2011 18:34
Last Modified:18 Jan 2018 17:34
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top