Ghorbanimehr, Mohammad Soroush
ORCID: https://orcid.org/0000-0002-4196-9358
(2025)
Facial Attractiveness Prediction Using a Single and Multi-Task Vision Transformer Framework.
Masters thesis, Concordia University.
Preview |
Text (application/pdf)
1MBGhorbanimehr_MSc_F2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Facial attractiveness prediction is a challenging and inherently subjective task in computer vision, with applications spanning social media, cosmetic technology, and aesthetic medicine. While convolutional neural networks (CNNs) have driven significant advances in this area, recent developments in transformer-based architectures, such as the Vision Transformer (ViT), offer new opportunities by capturing global feature relationships and long-range dependencies within images. This thesis explores the use of Vision Transformers for predicting facial attractiveness on the SCUT-FBP5500 dataset, where beauty scores are computed from the average ratings of multiple human annotators. The task is formulated as a regression problem to predict continuous attractiveness scores. To enhance the learned feature representations, a multi-task learning framework is introduced, jointly performing gender and ethnicity classification alongside beauty prediction. The methodology includes systematic image preprocessing, transfer learning with a ViT pretrained on large-scale facial recognition data, and fine-tuning for both primary and auxiliary tasks. Model performance is evaluated using PC, MAE, and RMSE for regression and classification accuracy for auxiliary tasks. Comparative experiments with CNN-based baselines demonstrate that transformer architectures capture more holistic and subtle aesthetic cues, resulting in improved prediction consistency. Experimental results show that the proposed ViT-based approach achieves superior accuracy and robustness compared to conventional CNNs, even with limited training data. These findings highlight the potential of our Vision Transformers as an effective and data-efficient alternative for facial aesthetic analysis. The thesis concludes by emphasizing the value of multi-task learning in enriching feature representations and encourages future research toward interpretable and scalable beauty prediction systems.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Ghorbanimehr, Mohammad Soroush |
| Institution: | Concordia University |
| Degree Name: | M. Comp. Sc. |
| Program: | Computer Science |
| Date: | 12 September 2025 |
| Thesis Supervisor(s): | Suen, Ching Y |
| ID Code: | 996307 |
| Deposited By: | Mohammad Soroush Ghorbanimehr |
| Deposited On: | 04 Nov 2025 15:37 |
| Last Modified: | 04 Nov 2025 15:37 |
Repository Staff Only: item control page


Download Statistics
Download Statistics