The Application of Machine Learning-Based Prediction Models for Cardiometabolic Risk Among a Representative US Adult Population: A Cross-Sectional Study of NHANES 1999-2006

Title:

The Application of Machine Learning-Based Prediction Models for Cardiometabolic Risk Among a Representative US Adult Population: A Cross-Sectional Study of NHANES 1999-2006

Xu, Jijie (2023) The Application of Machine Learning-Based Prediction Models for Cardiometabolic Risk Among a Representative US Adult Population: A Cross-Sectional Study of NHANES 1999-2006. Masters thesis, Concordia University.

Preview

Text (application/pdf)
Xu_MSc_F2023.pdf - Accepted Version
Available under License Spectrum Terms of Access.

1MB

Abstract

Introduction: Common measures of adiposity (such as body mass index) are only proxies. In contrast, dual-energy X-ray absorptiometry (DXA) is more precise to measure body composition. Therefore, this thesis utilized an unsupervised machine learning technique to group individuals based on similarities in their fat and muscle mass body-composition from DXA. Associations between the newly developed body-composition phenotypes with cardiometabolic risks were compared to phenotypes using a median split.

Methods: Data were collected from National Health and Nutrition Examination Survey (NHANES: 1999-2006 cycles, n=5,566; split into 70/30% training and test datasets), a representative U.S. population. The K-means cluster phenotypes based on partitioning observations from deciles of fat-mass and muscle-mass adjusted for age and sex were identified. Model fit was assessed using the silhouette and elbow method. Performance of logistic regression models to identify unfavorable cardiometabolic risks using either the K-means or the 50th percentile cut-off phenotypes was assessed with the area under the receiver operating characteristic (ROC-AUC). Analyses were performed separately for males and females and incorporated weighting and the complex sampling design.

Results: Optimal models were 2-means and 4-means k-clusters. ROC-AUCs from 2-means cluster models to identify cardiometabolic risk factors had the lowest predictive power (0.52 to 0.63). The ROC-AUCs from 50th percentile cut-off phenotypes and 4-means cluster phenotypes were higher (0.56-0.66, 0.57-0.67, respectively).

Discussion: Although the 4-means clustering was superior to the 50th percentile cut-off in predicting cardiometabolic risk, the ROC-AUCs were generally poor. Future work should investigate whether performing K-means clustering for each specific age improves their prediction.

Divisions:	Concordia University > Faculty of Arts and Science > Mathematics and Statistics
Item Type:	Thesis (Masters)
Authors:	Xu, Jijie
Institution:	Concordia University
Degree Name:	M. Sc.
Program:	Mathematics
Date:	April 2023
Thesis Supervisor(s):	Kakinami, Lisa
ID Code:	992326
Deposited By:	Jijie Xu
Deposited On:	16 Nov 2023 20:52
Last Modified:	16 Nov 2023 20:52

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

The Application of Machine Learning-Based Prediction Models for Cardiometabolic Risk Among a Representative US Adult Population: A Cross-Sectional Study of NHANES 1999-2006

The Application of Machine Learning-Based Prediction Models for Cardiometabolic Risk Among a Representative US Adult Population: A Cross-Sectional Study of NHANES 1999-2006

Abstract