Li, Hudong (2003) An inverted index generator for CINDI. Other thesis, Concordia University.
Human maintained search engines are expensive, slow to update, and cannot cover all the web pages. Automated search engines that rely on keyword matching usually return too many low quality results, with most users only looking at the first few tens of the search results. Because search engine development has gone on at companies with little publication of technical details, it is a challenging task to develop a search engine. The use of hypertextual information can help to improve search quality. This report addresses the question of how to build an inverted index for a search system that can use the additional information presented in hypertext to produce better search results. This report is part of the work of the Concordia INdexing and DIscovery (CINDI) Digital Library System. In this report, we summarize the research work I have done; we present some implementation issues for the project; and present the data structures that can be used in indexing web pages. The design decision was driven by the desire to have a reasonable compact data structure, and the ability to fetch a record in few disk seeks during a search. This project has been implemented in C++ on Linux platform.
|Divisions:||Concordia University > Faculty of Engineering and Computer Science > Computer Science and Software Engineering|
|Item Type:||Thesis (Other)|
|Pagination:||ix, 73 leaves : ill. ; 29 cm.|
|Degree Name:||Major reports (M.Comp.Sc.)|
|Program:||Computer Science and Software Engineering|
|Thesis Supervisor(s):||Desai, Bipin C.|
|Deposited By:||Concordia University Libraries|
|Deposited On:||27 Aug 2009 17:24|
|Last Modified:||08 Dec 2010 15:24|
Repository Staff Only: item control page