Smart infrastructures are increasingly built with cyber-physical systems that connect physical operational technology (OT) devices, networks and systems over a cyberspace of ubiquitous information technology (IT). A key objective of such interconnection is to offer a data coverage that will enable comprehensive visibility of dynamic environments and events. The arrival of Internet-of-Things, 5G, and beyond in smart infrastructures will enable the collection of unprecedented volumes of data from these various sources for critical visibility of the entire infrastructure with advanced situational awareness. To break the barriers between the different data silos that limit advanced machine learning techniques against cyber-physical attacks and damages and to allow the development of advanced cross-domain awareness models, the thesis tried to develop a modular, complete and scalable co-simulation platform allowing the generation of standardized datasets for research and development of smart distribution grid security. It addresses the lack of realistic training and testing data for machine learning models to enable the development of more advanced techniques. Our contributions are as follows. First, a modular platform for software-based co-simulation testbed generation is developed using the HELICS co-simulation framework. Second, scenarios of instabilities, faults, cyber-physical attacks are built to allow the generation of a realistic and multi-sourced dataset. Third, well-defined datasets are generated from the developed scenarios to enable and empower data-driven approaches toward smart distribution grid security.