Forecasting travel demand is a complex problem facing public transit operators. Passenger flow prediction is useful not only for operators, used for long-term planning and scheduling, but also for transit users. The time is quickly approaching that short-term passenger flow prediction will be expected as a matter of course by transit users. To address this expectation, a Bi-directional Long Short-Term Memory Neural Network model (BDLSTM NN) and a Bi-directional Long Short-Term Memory Neural Network Kalman Filter model (BDLSTM KF) predict short-term passenger flow and based on the dependencies between passenger count and spatial-temporal features. A comprehensive preprocessing framework is proposed leveraging historical data and extracting bidirectional features of passenger flow. The proposed model is based on [1] but adapted, applied, and analysed to produce optimal results for passenger flow forecasting on a bus route. Building on [2], a BDLSTM architecture is then combined with a Kalman filter. The Kalman filter reduces the training and testing complexity required for passenger flow forecasting. The BDLSTM-based Kalman filter produces predictions with less uncertainty than each method alone. Evaluating the BDLSTM-based Kalman filter with two months of real-world data, one year apart shows positive improvements for short-term forecasting in high complexity bus networks. It is possible to see that the BDLSTM outperforms traditional machine and deep learning techniques used in this context.