In the past decade, 3D objects have gained remarkable importance in everyday applications, and the ability to recognize them has therefore became a vital task in numerous fields. Ever since the emergence of 3D object recognition, there have been certain drawbacks that each newly invented model is striving to overcome. Among those shortcomings are; the ability to capture all critical features of the object, lack of spatial attributes consideration, insufficient visual relationships between semantic features, the necessity for expensive resources, and slow manipulation consequently. Computer Vision researchers have accomplished an excellent performance with multiple models, however, there is still an area for improvement. In this thesis, we are proposing two different novel 3D multi-view object classification methodologies inspired by Natural Language Processing (NLP) well-known approaches. The reason for this motivation is due to the NLP models’ impressive capability in capturing the underlying characteristics in texts and the semantic feature relationships from sequential data types. The first model is a statistical approach, named F-GDA, which deploys Generalized Dirichlet (GD) distribution in all its priors to compose a fully flexible framework and the later one, named VAeViT, incorporates the reputed deep learning architectures; Variational Autoencoder (VAE) and Vision Transformer (ViT) to form a comprehensive structure. Each model has been innovatively invented to resolve some major limitations confronted by the model’s methodology. Both models were evaluated on benchmark datasets and have proven reliably effective in classifying 3D multi-view objects and outperformed the state-of-the-art methodologies in the field.