Login | Register

Understanding and Locating Quality Issues in the Database Access Code of Database-Backed Applications

Title:

Understanding and Locating Quality Issues in the Database Access Code of Database-Backed Applications

Liu, Wei ORCID: https://orcid.org/0000-0001-8956-730X (2024) Understanding and Locating Quality Issues in the Database Access Code of Database-Backed Applications. PhD thesis, Concordia University.

[thumbnail of Liu_PhD_S2025.pdf]
Preview
Text (application/pdf)
Liu_PhD_S2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.
1MB

Abstract

Database-backed applications interact with the database management system (DBMS), such as MySQL, for persistent data storage. These database accesses play a central role in such applications and are crucial for their maintenance and quality. Developers build database-backed applications to access relational databases using object-oriented programming languages such as Java, Python, C#, PHP, and C++. Since object-oriented programming is a different paradigm compared to relational databases, developers use various technologies to ease database access by abstracting persistent data as objects. Specifically, developers often rely on two main access technologies: (i) executing a Structured Query Language (SQL) query and manually converting the results to objects; and (ii) using Object-Relational Mapping (ORM) frameworks, which automatically generate SQL queries and convert the results to objects based on various object-database mapping configurations. However, developers may face different database access challenges when using different technologies. Moreover, due to the abstraction of ORM frameworks, developers may face challenges when debugging database access problems. ORM automatically generates SQL queries based on various ORM configurations (e.g., the relationship among object types) and the invoked ORM APIs. As a result, developers do not have direct control over how ORM generates SQL queries. If there is a database access issue associated with a problematic-generated SQL query, developers may have difficulties knowing how and where the SQL query is generated in the application code, causing challenges in debugging database access problems.

Motivated by the importance and challenges of database access, in this thesis, we first conduct an empirical study of database access bugs in seven large-scale Java open-source applications that use relational database management systems. Specifically, by manually examining the bug reports and commit histories ranging from 5 to 16 years, we investigate and derive the characteristics such as categories, root cause, impact, and occurrence of database access issues when using popular database access technologies. Our empirical study provides motivations and guidelines for future research to help avoid, detect, and test database access bugs in database-backed applications. To assist developers in debugging database access problems, we propose an approach for locating the origin (i.e., the control flow path containing a sequence of method calls) that generates a given SQL query. It achieves state-of-the-art localization accuracy and improves Top@5 accuracy by 225% and 333% compared to the baseline approach when using SQL session logs and individual query logs, respectively. We also find that our approach can help developers locate data access issues that generate problematic SQL queries (i.e., slow SQL queries and database deadlocks). In conclusion, this thesis uncovers the root causes of database access issues and demonstrates that leveraging both static analysis and information retrieval techniques can help developers debug database access issues associated with problematic SQL queries. It also paves the way for future research on the development and automatic generation of tests for database access code to improve the quality of database-backed applications.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (PhD)
Authors:Liu, Wei
Institution:Concordia University
Degree Name:Ph. D.
Program:Computer Science
Date:26 September 2024
Thesis Supervisor(s):Chen, Tse-Hsun (Peter)
ID Code:994856
Deposited By: wei liu
Deposited On:17 Jun 2025 14:21
Last Modified:17 Jun 2025 14:21
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top