Keighobadi-Lamjiri, Abolfazl (2007) A syntactic candidate ranking method for answering non-copulative questions. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
5MBNR30132.pdf - Accepted Version |
Abstract
Question answering (QA) is the act of retrieving answers to questions posed in natural language. It is regarded as requiring more complex natural language processing (NLP) techniques than other types of information retrieval such as document retrieval. QA is sometimes regarded as the next step beyond search engines that ranks the retrieved candidates. Given a set of candidate sentences which contain keywords in common with the question, deciding which one actually answers the question is a challenge in question answering. In this thesis we propose a linguistic method for measuring the syntactic similarity of each candidate sentence to the question. This candidate scoring method uses the question head as an anchor to narrow down the search to a subtree in the parse tree of a candidate sentence (the target subtree). Semantic similarity of the action in the target subtree to the action asked in the question is then measured using WordNet::Similarity on their main verbs. In order to verify the syntactic similarity of this subtree to the question parse tree, syntactic restrictions as well as lexical measures compute the unifiability of critical syntactic participants in them. Finally, the noun phrase that is of the expected answer type in the target subtree is extracted and returned from the best candidate sentence when answering a factoid open domain question. In this thesis, we address both closed and open domain question answering problems. Initially, we propose our syntactic scoring method as a solution for questions in the Telecommunications domain. For our experiments in a closed domain, we build a set of customer service question/answer pairs from Bell Canada's Web pages. We show that the performance of this ranking method depends on the syntactic and lexical similarities in a question/answer pair. We observed that these closed domain questions ask for specific properties, procedures, or conditions about a technical topic. They are sometimes open-ended as well. As a result, detailed understanding of the question and the corpus text is required for answering them. As opposed to closed domain question, however, open domain questions have no restriction on the topic they can ask. The standard test bed for open domain question answering is the question/answer sets provided each year by the NIST organization through the TREC QA conferences. These are factoid questions that ask about a person, date, time, location, etc. Since our method relies on the semantic similarity of the main verbs as well as the syntactic overlap of counterpart subtrees from the question and the target subtrees, it performs well on questions with a main content verb and conventional subject-verb-object syntactic structure. The distribution of this type of questions versus questions having a 'to be' main verb is significantly different in closed versus open domain: around 70% of closed domain questions have a main content verb while more than 67% of open domain questions have a 'to be' main verb. This verb is very flexibility in connecting sentence entities. Therefore, recognizing equivallent syntactic structures between two copula parse trees is very hard. As a result, to better analyze the accuracy of this method, we create a new question categorization based on the question's main verb type: copulative questions ask about a state using a 'to be' verb, while non-copulative questions contain a main non-copula verb indicating an action or event. Our candidate answer ranking method achieves a precision of 47.0% in our closed domain, and 48% in answering the TREC 2003 to 2006 non-copulative questions. For answering open domain factoid questions, we feed the output of Aranea, a competitive question answering system in TREC 2002, to our linguistic method in order to provide it with Web redundancy statistics. This level of performance confirms our hypothesis of the potential usefulness of syntactic mapping for answering questions with a main content verb.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Keighobadi-Lamjiri, Abolfazl |
Pagination: | xi, 106 leaves : ill. ; 29 cm. |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Computer Science and Software Engineering |
Date: | 2007 |
Thesis Supervisor(s): | Kosseim, Leila |
Identification Number: | LE 3 C66C67P 2007 K45 |
ID Code: | 975295 |
Deposited By: | Concordia University Library |
Deposited On: | 22 Jan 2013 16:05 |
Last Modified: | 13 Jul 2020 20:07 |
Related URLs: |
Repository Staff Only: item control page