Breadcrumb

 
 

A gleaning subsystem for CINDI

Title:

A gleaning subsystem for CINDI

Zhang, Tong (2004) A gleaning subsystem for CINDI. Masters thesis, Concordia University.

[img]
Preview
PDF - Accepted Version
8Mb

Abstract

Internet search engines typically use Internet crawlers, or robots, for the purpose of constructing and maintaining a searchable index of resources on the Web. Topic-specific robots will become popular in the next generation. They gather information on the Internet in specific domains by means of information filtering technology. The CINDI Robot System is such an application in academic domain. This research is concerned with a structure-based gleaning subsystem for CINDI. The system separates theses, technical reports, academic papers, and FAQs as resources while e-mails, letters, resumes, graphics, and discussion groups are considered as chaff. This system makes decisions based on weight, which is carefully assigned to each resource by matching its structure with predefined Document Type Definitions (DTDs). The DTDs for the typical structure for the specific document types are built based on some predefined profiles. The system also features conversion subsystem in Windows environment to unify document formats for CINDI. (Abstract shortened by UMI.)

Divisions:Concordia University > Faculty of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Zhang, Tong
Pagination:ix, 109 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science and Software Engineering
Date:2004
Thesis Supervisor(s):Desai, Bipin C
ID Code:8181
Deposited By:Concordia University Libraries
Deposited On:18 Aug 2011 14:17
Last Modified:18 Aug 2011 15:43
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Document Downloads

More statistics for this item...

Concordia University - Footer