This thesis introduces an improved Convolutional Neural Network (CNN) model that selectively excludes irrelevant elements from input vectors to extract distinct features. The model employs Convolution (Conv) to encode data information and Deconvolution (Deconv) layers to reconstruct the spatial dimensions of feature maps and utilizing shortcut connections to exploit applicable sparsity and capture comprehensive and detailed information. This research addresses the limitations of the traditional SoftMax Loss function, which tends to overfit by wrongly classifying due to shortage of discriminability through integrating a regularization technique to the conventional Cross-Entropy loss and introducing an adaptive-margin to the standard SoftMax function. The adjusted SoftMax Loss is designed to enhance the separation of different embedding vectors and tighten clusters of similar ones. This adjustment results in boosting both the diversity between different classes and the similarity within the same class. These modifications aim to elevate the proposed model’s accuracy, and evaluations are benchmarked against existing methods using well-known datasets such as CIFAR10 , MNIST, and SVHN. Lastly, the application of the model is fine-tunned in the domain of facial expression recognition (FER). This work asserts the model's advanced capabilities over state-of-the-art methods in FER and demonstrated its effectiveness through superior accuracy on three established datasets: FER-2013, RAF-DB, and CK+.