Fault Analysis Using Learning Models with Model Interpretation

Title:

Fault Analysis Using Learning Models with Model Interpretation

Whatley, Justin (2021) Fault Analysis Using Learning Models with Model Interpretation. Masters thesis, Concordia University.

[thumbnail of Whatley_MCompSc_F2021.pdf]

Preview

Text (application/pdf)
Whatley_MCompSc_F2021.pdf - Accepted Version
Available under License Spectrum Terms of Access.

1MB

Abstract

As machine learning moves from theoretical applications in academia to promising solutions to problems across industry and healthcare, effective interpretability strategies are critically impor- tant to adoption. However, model interpretability strategies can be extended to offer more than validation for the predictions a model is making. Learned models offer a proxy for the data by capturing relationships between feature inputs and target outcomes, offering a representation that can be analysed. To that end, this work describes a fault analysis system that leverages learned models to characterize faults by using SHapley Additive exPlanations (SHAP).
In particular, this fault analysis system was designed for large structured datasets such as those available in telecommunications networks. The strategy works by forming a learned representation with tree-based models using gradient-boosting. Once a problematic sample is selected for analysis, the computationally efficient implementation of the SHAP algorithm specialized for tree-based models is employed to gauge feature contributions to the performance degradation observed in the sample. Thus, this fault analysis strategy effectively provides an explanation for the degradation in a problematic sample informed through a model-based representation of the relevance of input characteristics across contexts. An evaluation of the strategy is performed, demonstrating its reliability for structured communications data using a 4G LTE dataset.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:	Thesis (Masters)
Authors:	Whatley, Justin
Institution:	Concordia University
Degree Name:	M. Comp. Sc.
Program:	Computer Science
Date:	6 August 2021
Thesis Supervisor(s):	Fevens, Thomas
Keywords:	Fault analysis, machine learning, model interpretability, SHAP, telecommunications
ID Code:	988726
Deposited By:	JUSTIN WHATLEY
Deposited On:	01 Dec 2021 14:00
Last Modified:	31 Dec 2021 01:00

References:

Nova passive agent. https://www.exfo.com/en/products/nova-passive-agent/, 2021. Accessed: 2021-06-02.
[2] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA, 2016. Association for Computing Machinery.
[3] Walter D. Fisher. On grouping for maximum homogeneity. Journal of the American Statistical Association, 53(284):789–798, 1958.
[4] Plotly Technologies Inc. Collaborative data science, 2015. Accessed: 2021-07-02.
[5] Dominik Janzing, Lenon Minorics, and Patrick Bloebaum. Feature relevance quantification in explainable ai: A causal problem. In Silvia Chiappa and Roberto Calandra, editors, Proceed- ings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 2907–2916. PMLR, 26–28 Aug 2020.
[6] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[7] Cassie Kozyrkov. Difference between analytics and statistics. https://www.kdnuggets.com/2019/09/difference-analytics-statistics.html, 2019. Accessed: 2021-04-18.
69
[8] Stan Lipovetsky and Michael Conklin. Analysis of regression in game theory approach. Applied Stochastic Models in Business and Industry, 17(4):319–330, 2001.
[9] Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence, 2(1):2522– 5839, 2020.
[10] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc., 2017.
[11] Scott M Lundberg, Bala Nair, Monica S Vavilala, Mayumi Horibe, Michael J Eisses, Trevor Adams, David E Liston, Daniel King-Wai Low, Shu-Fang Newman, Jerry Kim, et al. Explain- able machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2(10):749, 2018.
[12] Yasuko Matsui and Tomomi Matsui. Np-completeness for calculating power indices of weighted majority games. Theor. Comput. Sci., 263(1–2):306–310, July 2001.
[13] Christoph Molnar. Interpretable Machine Learning. 2019. https://christophm.github.io/ interpretable-ml-book/.
[14] K. Rashmi and Ran Gilad-Bachrach. Dart: Dropouts meet multiple additive regression trees. 05 2015.
[15] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should i trust you?”: Explain- ing the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144, New York, NY, USA, 2016. Association for Computing Machinery.
[16] Ando Saabas. Dinterpreting random forests. http://blog.datadive.net/ interpreting-random- forests/, 2014. Accessed: 2021-05-21.
[17] Lloyd Shapley. A value for n-person games. Contributions to the Theory of Games, 2.28:307–317, 1953.
70

[18] Erik Strumbelj and Igor Kononenko. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research, 11:1–18, 01 2010.
[19] Mukund Sundararajan and Amir Najmi. The many shapley values for model explanation. In Hal Daum ́e III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9269–9278. PMLR, 13–18 Jul 2020.
[20] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, St ́efan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, I ̇lhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antoˆnio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020.

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Fault Analysis Using Learning Models with Model Interpretation

Fault Analysis Using Learning Models with Model Interpretation

Abstract

References: