Login | Register

An inverted index generator for CINDI


An inverted index generator for CINDI

Li, Hudong (2003) An inverted index generator for CINDI. [Graduate Projects (Non-thesis)] (Unpublished)

[thumbnail of MQ77916.pdf]
Text (application/pdf)


Human maintained search engines are expensive, slow to update, and cannot cover all the web pages. Automated search engines that rely on keyword matching usually return too many low quality results, with most users only looking at the first few tens of the search results. Because search engine development has gone on at companies with little publication of technical details, it is a challenging task to develop a search engine. The use of hypertextual information can help to improve search quality. This report addresses the question of how to build an inverted index for a search system that can use the additional information presented in hypertext to produce better search results. This report is part of the work of the Concordia INdexing and DIscovery (CINDI) Digital Library System. In this report, we summarize the research work I have done; we present some implementation issues for the project; and present the data structures that can be used in indexing web pages. The design decision was driven by the desire to have a reasonable compact data structure, and the ability to fetch a record in few disk seeks during a search. This project has been implemented in C++ on Linux platform.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Graduate Projects (Non-thesis)
Authors:Li, Hudong
Pagination:ix, 73 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Department (as was):Department of Computer Science
Thesis Supervisor(s):Desai, Bipin C.
Identification Number:QA 76 M26+ 2003 no.11
ID Code:2065
Deposited By: Concordia University Library
Deposited On:27 Aug 2009 17:24
Last Modified:20 Oct 2022 20:45
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top