Breadcrumb

 
 

CNDROBOT : a robot for the CINDI digital library system

Title:

CNDROBOT : a robot for the CINDI digital library system

Zhou, Cong (2005) CNDROBOT : a robot for the CINDI digital library system. Masters thesis, Concordia University.

[img]
Preview
PDF - Accepted Version
6Mb

Abstract

Web robots or crawlers are an essential component of all search engines. Major search engines such as Google and AltaVista use their own robots (GoogleBot and Mercator) to crawl and index billions of Web pages over the Internet. Web robots are also increasingly adopted by digital libraries to collect data and on-line documents. The crawling process requires massive amounts of hardware and network resources as well as time. However, when only information about a predefined topic set is desired, the use of traditional crawling strategy becomes inefficient and cost ineffective. This thesis presents issues in developing a focused crawler - CNDROBOT, which only explores well-selected domain sites and collects potential on-topic documents for the CINDI digital library. The research was concerned with the studies on various search engines, types of Web robots, and crawling strategies. The research primarily involved the design and implementation of the CNDROBOT as well as the integration of the Document Filtering Subsystem. Finally, a Web application for the CNDROT was developed and an extensive test was conducted for various components and functions of this system. This thesis demonstrates that the CNDROBOT is capable of effectively and efficiently discovering large amounts of desired documents and supplying them for the CINDI digital library

Divisions:Concordia University > Faculty of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Zhou, Cong
Pagination:viii, 127 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science and Software Engineering
Date:2005
Thesis Supervisor(s):Desai, Bipin
ID Code:8750
Deposited By:Concordia University Libraries
Deposited On:18 Aug 2011 14:34
Last Modified:18 Aug 2011 14:34
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Document Downloads

More statistics for this item...

Concordia University - Footer