Zhang, Tong (2004) A gleaning subsystem for CINDI. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
8MBMQ94759.pdf - Accepted Version |
Abstract
Internet search engines typically use Internet crawlers, or robots, for the purpose of constructing and maintaining a searchable index of resources on the Web. Topic-specific robots will become popular in the next generation. They gather information on the Internet in specific domains by means of information filtering technology. The CINDI Robot System is such an application in academic domain. This research is concerned with a structure-based gleaning subsystem for CINDI. The system separates theses, technical reports, academic papers, and FAQs as resources while e-mails, letters, resumes, graphics, and discussion groups are considered as chaff. This system makes decisions based on weight, which is carefully assigned to each resource by matching its structure with predefined Document Type Definitions (DTDs). The DTDs for the typical structure for the specific document types are built based on some predefined profiles. The system also features conversion subsystem in Windows environment to unify document formats for CINDI. (Abstract shortened by UMI.)
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Zhang, Tong |
Pagination: | ix, 109 leaves : ill. ; 29 cm. |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science and Software Engineering |
Date: | 2004 |
Thesis Supervisor(s): | Desai, Bipin C |
Identification Number: | TK 5105.884 Z438 2004 |
ID Code: | 8181 |
Deposited By: | Concordia University Library |
Deposited On: | 18 Aug 2011 18:17 |
Last Modified: | 13 Jul 2020 20:03 |
Related URLs: |
Repository Staff Only: item control page