Li, Hudong (2003) An inverted index generator for CINDI. [Graduate Projects (Non-thesis)] (Unpublished)
Preview |
Text (application/pdf)
1MBMQ77916.pdf |
Abstract
Human maintained search engines are expensive, slow to update, and cannot cover all the web pages. Automated search engines that rely on keyword matching usually return too many low quality results, with most users only looking at the first few tens of the search results. Because search engine development has gone on at companies with little publication of technical details, it is a challenging task to develop a search engine. The use of hypertextual information can help to improve search quality. This report addresses the question of how to build an inverted index for a search system that can use the additional information presented in hypertext to produce better search results. This report is part of the work of the Concordia INdexing and DIscovery (CINDI) Digital Library System. In this report, we summarize the research work I have done; we present some implementation issues for the project; and present the data structures that can be used in indexing web pages. The design decision was driven by the desire to have a reasonable compact data structure, and the ability to fetch a record in few disk seeks during a search. This project has been implemented in C++ on Linux platform.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Graduate Projects (Non-thesis) |
Authors: | Li, Hudong |
Pagination: | ix, 73 leaves : ill. ; 29 cm. |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science |
Department (as was): | Department of Computer Science |
Date: | 2003 |
Thesis Supervisor(s): | Desai, Bipin C. |
Identification Number: | QA 76 M26+ 2003 no.11 |
ID Code: | 2065 |
Deposited By: | Concordia University Library |
Deposited On: | 27 Aug 2009 17:24 |
Last Modified: | 20 Oct 2022 20:45 |
Related URLs: |
Repository Staff Only: item control page