Naderi, Nona (2011) Automated Extraction of Protein Mutation Impacts from the Biomedical Literature. Masters thesis, Concordia University.
- Accepted Version
Mutations as sources of evolution have long been the focus of attention in the
biomedical literature. Accessing the mutational information and their impacts
on protein properties facilitates research in various domains, such as
enzymology and pharmacology. However, manually reading through the rich and fast growing repository
of biomedical literature is expensive and time-consuming. A number of manually curated
databases, such as BRENDA (http://www.brenda-enzymes.org), try to index and provide this
information; yet the provided data seems to be incomplete. Thus, there is a
growing need for automated approaches to extract this information.
In this work, we present a system to automatically extract and summarize impact
information from protein mutations.
Our system extraction module is split into subtasks: organism analysis,
mutation detection, protein property extraction and impact
analysis. Organisms, as sources of proteins, are required to be extracted to
help disambiguation of genes and proteins. Thus, our system extracts and
grounds organisms to NCBI. We detect mutation series to correctly ground our detected
impacts. Our system also extracts the affected protein properties as well as the magnitude of the
The output of our system is populated to an OWL-DL ontology, which can then be queried to provide structured information. The performance
of the system is evaluated on both external and internal corpora and
databases. The results show the reliability of the approaches. Our Organism
extraction system achieves a precision and recall of 95%
and 94% and a grounding accuracy of 97.5% on the OT corpus. On the manually
annotated corpus of Linneaus-100, the results show a precision and recall of
99% and 97% and grounding with an accuracy of 97.4%.
In the impact detection task, our system achieves a precision and recall of
70.4%-71.8% and 71.2%-71.3% on a manually annotated documents. Our system grounds the detected
impacts with an accuracy of 70.1%-71.7% on the manually annotated documents
and a precision and recall of 57%-57.5% and 82.5%-84.2% against the BRENDA data.
|Divisions:||Concordia University > Faculty of Engineering and Computer Science > Computer Science and Software Engineering|
|Item Type:||Thesis (Masters)|
|Degree Name:||M. Comp. Sc.|
|Date:||12 September 2011|
|Thesis Supervisor(s):||Witte, René|
|Deposited By:||NONA NADERI|
|Deposited On:||21 Nov 2011 16:48|
|Last Modified:||21 Nov 2011 16:48|
Repository Staff Only: item control page