Jung, Sungkyu, Sen, Arusharka and Marron, J.S. (2012) Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA. Journal of Multivariate Analysis, 109 . pp. 190203. ISSN 0047259X

Text (application/pdf)
279kBsen2012.pdf  Accepted Version 
Official URL: http://dx.doi.org/10.1016/j.jmva.2012.03.005
Abstract
In High Dimension, Low Sample Size (HDLSS) data situations, where the dimension d is much larger than the sample size n, principal component analysis (PCA) plays an important role in statistical analysis. Under which conditions does the sample PCA well reflect the population covariance structure? We answer this question in a relevant asymptotic context where d grows and n is fixed, under a generalized spiked covariance model. Specifically, we assume the largest population eigenvalues to be of the order dα, where α<, =, or >1. Earlier results show the conditions for consistency and strong inconsistency of eigenvectors of the sample covariance matrix. In the boundary case, α=1, where the sample PC directions are neither consistent nor strongly inconsistent, we show that eigenvalues and eigenvectors do not degenerate but have limiting distributions. The result smoothly bridges the phase transition represented by the other two cases, and thus gives a spectrum of limits for the sample PCA in the HDLSS asymptotics. While the results hold under a general situation, the limiting distributions under Gaussian assumption are illustrated in greater detail. In addition, the geometric representation of HDLSS data is extended to give three different representations, that depend on the magnitude of variances in the first few principal components.
Divisions:  Concordia University > Faculty of Arts and Science > Mathematics and Statistics 

Item Type:  Article 
Refereed:  Yes 
Authors:  Jung, Sungkyu and Sen, Arusharka and Marron, J.S. 
Journal or Publication:  Journal of Multivariate Analysis 
Date:  2012 
Digital Object Identifier (DOI):  10.1016/j.jmva.2012.03.005 
Keywords:  Principal component analysis; High Dimension Low Sample Size; Geometric representation; ρmixing; Consistency and strong inconsistency; Spiked covariance model 
ID Code:  976820 
Deposited By:  DANIELLE DENNIE 
Deposited On:  29 Jan 2013 13:28 
Last Modified:  18 Jan 2018 17:43 
References:
[1] A. Bhattacharjee, W. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E. J. Mark, E. S. Lander, W. Wong, B. E. Johnson, T. R. Golub, D. J. Sugarbaker, M. Meyerson, Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses, Proc. Natl. Acad. Sci. USA 98(24):137905.[2] J. Ahn, J.S. Marron, K.M. Muller, Y.Y. Chi The highdimension, lowsamplesize geometric representation holds under mild conditions Biometrika, 94 (3) (2007), pp. 760–766
[3] Z. Bai, J.W. Silverstein Spectral Analysis of Large Dimensional Random Matrices, Springer Series in Statistics (second ed.), Springer, New York (2010) http://dx.doi.org/10.1007/9781441906618
[4] J. Baik, J.W. Silverstein Eigenvalues of large sample covariance matrices of spiked population models J. Multivariate Anal., 97 (6) (2006), pp. 1382–1408
[5] A. Bhattacharjee, W. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E.J. Mark, E.S. Lander, W. Wong, B.E. Johnson, T.R. Golub, D.J. Sugarbaker, M. Meyerson Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses Proc. Natl. Acad. Sci. USA, 98 (24) (2001), pp. 13790–13795
[6] R.C. Bradley Basic properties of strong mixing conditions. A survey and some open questions Probab. Surv., 2 (2005), pp. 107–144 (electronic), Update of, and a supplement to, the 1986 original
[7] G. Casella, J.T. Hwang Limit expressions for the risk of James–Stein estimators Canad. J. Statist., 10 (4) (1982), pp. 305–309 http://dx.doi.org/10.2307/3556196
[8] N. El Karoui Spectrum estimation for large dimensional covariance matrices using random matrix theory Ann. Statist., 36 (6) (2008), pp. 2757–2790 http://dx.doi.org/10.1214/07AOS581
[9] T.L. Gaydos, Data representation and basis selection to understand variation of function valued traits, Ph.D. Thesis, University of North Carolina at Chapel Hill, 2008.
[10] G.H. Golub, C.F. Van Loan Matrix Computations, Johns Hopkins Studies in the Mathematical Sciences (third ed.), Johns Hopkins University Press, Baltimore, MD (1996)
[11] P. Hall, J.S. Marron, A. Neeman Geometric representation of high dimension, low sample size data J. R. Stat. Soc. Ser. B Stat. Methodol., 67 (3) (2005), pp. 427–444
[12] H. Huang, Y. Liu, J.S. Marron, Bidirectional discrimination with application to data visualization, manuscript, 2012.
[13] I.M. Johnstone On the distribution of the largest eigenvalue in principal components analysis Ann. Statist., 29 (2) (2001), pp. 295–327
[14] S. Jung, J.S. Marron PCA consistency in high dimension, low sample size context Ann. Statist., 37 (6B) (2009), pp. 4104–4130
[15] A.N. Kolmogorov, Y.A. Rozanov On strong mixing conditions for stationary Gaussian processes Theory Probab. Appl., 5 (2) (1960), pp. 204–208
[16] S. Lee, F. Zou, F.A. Wright Convergence and prediction of principal component scores in highdimensional settings Ann. Statist., 38 (6) (2010), pp. 3605–3629 http://dx.doi.org/10.1214/10AOS821
[17] R.J. Muirhead Aspects of Multivariate Statistical Theory, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons Inc., New York (1982)
[18] B. Nadler Finite sample approximation results for principal component analysis: a matrix perturbation approach Ann. Statist., 36 (6) (2008), pp. 2791–2817 http://dx.doi.org/10.1214/08AOS618
[19] D. Paul Asymptotics of sample eigenstructure for a large dimensional spiked covariance model Statist. Sinica, 17 (2007), pp. 1617–1642
[20] F. Pesarin, L. Salmaso Finitesample consistency of combinationbased permutation tests with application to repeated measures designs J. Nonparametr. Stat., 22 (5–6) (2010), pp. 669–684 http://dx.doi.org/10.1080/10485250902807407
[21] F. Pesarin, L. Salmaso Permutation Tests for Complex Data: Theory, Applications and Software Wiley, Chichester, UK (2010)
[22] X. Qiao, H.H. Zhang, Y. Liu, M. Todd, J.S. Marron Weighted distance weighted discrimination and its asymptotic properties J. Amer. Statist. Assoc., 105 (489) (2010), pp. 401–414
[23] G.W. Stewart, J.G. Sun Matrix Perturbation Theory, Computer Science and Scientific Computing, Academic Press Inc., Boston, MA (1990)
[24] K. Yata, M. Aoshima Effective PCA for highdimension, lowsamplesize data with singular value decomposition of cross data matrix J. Multivariate Anal., 101 (9) (2010), pp. 2060–2077 http://dx.doi.org/10.1016/j.jmva.2010.04.006
[25] K. Yata, M. Aoshima PCA consistency for nonGaussian data in high dimension, low sample size context Comm. Statist. Theory Methods, 38 (16–17) (2009), pp. 2634–2652
Repository Staff Only: item control page