This thesis reports an application of using two stage optimization framework to solve the clustered
asymmetric traveling salesman problem to compete in the 2021 Amazon Last Mile Routing
Research Challenge. This approach implicitly leverages tacit knowledge encoded in delivery data to
learn a data-driven routing algorithm capable of predicting high quality routes. Given a set of features
and a set of decision targets corresponding to routes taken by experienced drivers, the model
seeks to learn routing models that provide near-optimal decisions for a set of high quality observed
routes. The thesis presents and computationally compares the algorithmic approaches capable of
learning feature-dependent cost functions for the base routing model. The results of extensive computational
experiments based on real data provided by Amazon involving 6,112 historical routes
show that this approach is comparable in score with the prize winners methods.