Xu, Jijie (2023) The Application of Machine Learning-Based Prediction Models for Cardiometabolic Risk Among a Representative US Adult Population: A Cross-Sectional Study of NHANES 1999-2006. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
1MBXu_MSc_F2023.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Introduction: Common measures of adiposity (such as body mass index) are only proxies. In contrast, dual-energy X-ray absorptiometry (DXA) is more precise to measure body composition. Therefore, this thesis utilized an unsupervised machine learning technique to group individuals based on similarities in their fat and muscle mass body-composition from DXA. Associations between the newly developed body-composition phenotypes with cardiometabolic risks were compared to phenotypes using a median split.
Methods: Data were collected from National Health and Nutrition Examination Survey (NHANES: 1999-2006 cycles, n=5,566; split into 70/30% training and test datasets), a representative U.S. population. The K-means cluster phenotypes based on partitioning observations from deciles of fat-mass and muscle-mass adjusted for age and sex were identified. Model fit was assessed using the silhouette and elbow method. Performance of logistic regression models to identify unfavorable cardiometabolic risks using either the K-means or the 50th percentile cut-off phenotypes was assessed with the area under the receiver operating characteristic (ROC-AUC). Analyses were performed separately for males and females and incorporated weighting and the complex sampling design.
Results: Optimal models were 2-means and 4-means k-clusters. ROC-AUCs from 2-means cluster models to identify cardiometabolic risk factors had the lowest predictive power (0.52 to 0.63). The ROC-AUCs from 50th percentile cut-off phenotypes and 4-means cluster phenotypes were higher (0.56-0.66, 0.57-0.67, respectively).
Discussion: Although the 4-means clustering was superior to the 50th percentile cut-off in predicting cardiometabolic risk, the ROC-AUCs were generally poor. Future work should investigate whether performing K-means clustering for each specific age improves their prediction.
Divisions: | Concordia University > Faculty of Arts and Science > Mathematics and Statistics |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Xu, Jijie |
Institution: | Concordia University |
Degree Name: | M. Sc. |
Program: | Mathematics |
Date: | April 2023 |
Thesis Supervisor(s): | Kakinami, Lisa |
ID Code: | 992326 |
Deposited By: | Jijie Xu |
Deposited On: | 16 Nov 2023 20:52 |
Last Modified: | 16 Nov 2023 20:52 |
Repository Staff Only: item control page