Login | Register

CNDROBOT : a robot for the CINDI digital library system


CNDROBOT : a robot for the CINDI digital library system

Zhou, Cong (2005) CNDROBOT : a robot for the CINDI digital library system. Masters thesis, Concordia University.

[thumbnail of MR14343.pdf]
Text (application/pdf)
MR14343.pdf - Accepted Version


Web robots or crawlers are an essential component of all search engines. Major search engines such as Google and AltaVista use their own robots (GoogleBot and Mercator) to crawl and index billions of Web pages over the Internet. Web robots are also increasingly adopted by digital libraries to collect data and on-line documents. The crawling process requires massive amounts of hardware and network resources as well as time. However, when only information about a predefined topic set is desired, the use of traditional crawling strategy becomes inefficient and cost ineffective. This thesis presents issues in developing a focused crawler - CNDROBOT, which only explores well-selected domain sites and collects potential on-topic documents for the CINDI digital library. The research was concerned with the studies on various search engines, types of Web robots, and crawling strategies. The research primarily involved the design and implementation of the CNDROBOT as well as the integration of the Document Filtering Subsystem. Finally, a Web application for the CNDROT was developed and an extensive test was conducted for various components and functions of this system. This thesis demonstrates that the CNDROBOT is capable of effectively and efficiently discovering large amounts of desired documents and supplying them for the CINDI digital library

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Zhou, Cong
Pagination:viii, 127 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science and Software Engineering
Thesis Supervisor(s):Desai, Bipin
Identification Number:LE 3 C66C67M 2005 Z47
ID Code:8750
Deposited By: Concordia University Library
Deposited On:18 Aug 2011 18:34
Last Modified:13 Jul 2020 20:05
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top