Non-functional bugs bear a heavy cost on both software developers and end-users. Tools to reduce the occurrence, impact, and repair time of non-functional bugs can therefore provide key assistance for software developers racing to fix these issues. Classification models that focus on identifying defect-prone commits, referred to as \emph{Just-In-Time (JIT) Quality Assurance} are known to be useful in allowing developers to review risky commits. JIT models, however, leverage the SZZ approach to identify whether or not a past change is bug-inducing. However, the due to the nature of non-functional bugs, their fixes may be scattered and separate from their bug-inducing locations in the source code. Yet, prior studies that leverage or evaluate the SZZ approach do not consider non-functional bugs, leading to potential bias on the results. In this thesis, we conduct an empirical study on the results of the SZZ approach on the non-functional bugs in the NFBugs dataset, and the performance bugs in Cassandra, and Hadoop. We manually examine whether each identified bug-inducing change is indeed the correct bug-inducing change. Our manual study shows that a large portion of non-functional bugs cannot be properly identified by the SZZ approach. We uncover root causes for false detection that have not been previously found. We evaluate the identified bug-inducing changes based on criteria from prior research. Our results may be used to assist in future research on non-functional bugs, and highlight the need to complement SZZ to accommodate the unique characteristics of non-functional bugs. Furthermore, we conduct an empirical study to evaluate model performance for JIT models by using them to identify bug-inducing code commits for performance related bugs. Our findings show that JIT defect prediction classifies non-performance bug-inducing commits better than performance bug-inducing commits. However, we find that manually correcting errors in the training data only slightly improves the models. In the absence of a large number of correctly labelled performance bug-inducing commits, our findings show that combining all available training data yields the best classification results.