Machine learning is a popular approach to security monitoring and intrusion detection in cyber-physical systems (CPS) like the smart grid. General ML approaches presume that the training and testing data are generated by identical or similar independent distribution. This assumption may not hold in many real-world systems and applications like the CPS, since the system and attack dynamics may change the data distribution and thus fail the trained models. Transfer learning (TL) is a promising solution to tackle data distribution divergence problem and maintain performance when facing system and attack variations. However, there are still two challenges in introducing TL into intrusion detection: when to apply TL and how to extract effective features during TL. To address these two challenges, this research proposes a transferability analysis and domain-adversarial training (TADA) framework. This work first proposes a divergence-based transferability analysis to decide whether to apply TL, then develops a spatial-temporal domain-adversarial (DA) training model to reduce distribution divergence between two domains and improve attack detection performance. The main contributions include: (i) A divergence-based transferability analysis to help evaluate the necessity of TL in security monitoring for CPS, such as intrusion detection in the smart grid; (ii) A spatial-temporal DA training approach to extract the spatial-temporal domain-invariant features to mitigate the impact of distribution divergence and enhance detection performance. The extensive experiments demonstrate that the transferability analysis is capable of predicting accuracy drop and determining whether to apply TL. Compared to the state-of-the-art models, TADA can achieve high and more robust detection performance under system and attack variations.