Login | Register

Accurate Abstract Syntax Tree Differencing: Language-Aware Design, Benchmarking, and Empirical Assessment

Title:

Accurate Abstract Syntax Tree Differencing: Language-Aware Design, Benchmarking, and Empirical Assessment

Alikhanifard, Pouria (2025) Accurate Abstract Syntax Tree Differencing: Language-Aware Design, Benchmarking, and Empirical Assessment. PhD thesis, Concordia University.

[thumbnail of Alikhanifard_PhD_S2026.pdf]
Preview
Text (application/pdf)
Alikhanifard_PhD_S2026.pdf - Accepted Version
Available under License Spectrum Terms of Access.
6MB

Abstract

Software undergoes constant changes to support new requirements, address bugs, enhance performance, and ensure maintainability. As a result, developers spend a large portion of their workday understanding and reviewing code changes. Abstract Syntax Tree (AST) diff tools were developed to overcome the limitations of line-based diff tools, which are still the default for most developers. Despite their advantages in capturing structural changes, existing AST diff tools suffer from serious limitations, such as lacking multi-mapping support, matching semantically incompatible nodes, ignoring language-specific clues, lacking refactoring awareness, and offering no commit-level diff support.

To address these issues, we propose a novel AST diff tool based on RefactoringMiner that resolves all aforementioned limitations. We improve statement mapping accuracy and introduce an algorithm that produces commit-level AST diffs using refactoring instances and matched program elements. Our evaluation demonstrates significant improvements in both precision and recall, while maintaining competitive execution times.

To facilitate objective and reproducible assessment of diff quality, we introduce a benchmarking framework that measures precision and recall across existing tools using a curated ground-truth of AST node mappings. This infrastructure supports rigorous comparisons and enables deeper investigations into the impact of AST representations and algorithm design choices.

Finally, we investigate the relationship between edit script length and diff quality by combining metric-based analysis with human feedback, revealing that minimizing edit length is not a reliable indicator of developer-preferred diffs.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (PhD)
Authors:Alikhanifard, Pouria
Institution:Concordia University
Degree Name:Ph. D.
Program:Software Engineering
Date:5 August 2025
Thesis Supervisor(s):Tsantalis, Nikolaos
ID Code:996326
Deposited By: Pouria AlikhaniFard
Deposited On:17 Nov 2025 20:28
Last Modified:17 Nov 2025 20:28
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top