Login | Register

A gleaning subsystem for CINDI


A gleaning subsystem for CINDI

Zhang, Tong (2004) A gleaning subsystem for CINDI. Masters thesis, Concordia University.

PDF - Accepted Version


Internet search engines typically use Internet crawlers, or robots, for the purpose of constructing and maintaining a searchable index of resources on the Web. Topic-specific robots will become popular in the next generation. They gather information on the Internet in specific domains by means of information filtering technology. The CINDI Robot System is such an application in academic domain. This research is concerned with a structure-based gleaning subsystem for CINDI. The system separates theses, technical reports, academic papers, and FAQs as resources while e-mails, letters, resumes, graphics, and discussion groups are considered as chaff. This system makes decisions based on weight, which is carefully assigned to each resource by matching its structure with predefined Document Type Definitions (DTDs). The DTDs for the typical structure for the specific document types are built based on some predefined profiles. The system also features conversion subsystem in Windows environment to unify document formats for CINDI. (Abstract shortened by UMI.)

Divisions:Concordia University > Faculty of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Zhang, Tong
Pagination:ix, 109 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science and Software Engineering
Thesis Supervisor(s):Desai, Bipin C
ID Code:8181
Deposited By: Concordia University Libraries
Deposited On:18 Aug 2011 18:17
Last Modified:18 Aug 2011 19:43
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page


Downloads per month over past year

Back to top Back to top