This thesis reports an application of using two stage optimization framework to solve the clustered asymmetric traveling salesman problem to compete in the 2021 Amazon Last Mile Routing Research Challenge. This approach implicitly leverages tacit knowledge encoded in delivery data to learn a data-driven routing algorithm capable of predicting high quality routes. Given a set of features and a set of decision targets corresponding to routes taken by experienced drivers, the model seeks to learn routing models that provide near-optimal decisions for a set of high quality observed routes. The thesis presents and computationally compares the algorithmic approaches capable of learning feature-dependent cost functions for the base routing model. The results of extensive computational experiments based on real data provided by Amazon involving 6,112 historical routes show that this approach is comparable in score with the prize winners methods.