With the advances in sensing and communication techniques, data collection has become much easier in manufacturing
processes. Machine learning (ML) is a vital tool for manufacturing data analytics to leverage the underlying informatics carried
by data. However, the varieties of data formats, dimensionality,
and manufacturing types hugely hinder the learning efficiency of
ML methods. Data preparation is critical for exploiting the potential of ML in manufacturing problems. This paper investigates
how data preparation affects the ML efficacy in manufacturing
data. Specifically, we study the influences of data normalization and dimension reduction on the ML performance for various
types of manufacturing problems. We conduct comparison studies of data with/without pre-processing on different manufacturing processes, such as casting, milling, and additive manufacturing. Experimental results reveal that different pre-processing
methods have a distinct effect on learning efficiency. Normalization is helpful for both numerical and image data, while dimension reduction – this paper uses principal component analysis
(PCA) – is not useful for low-dimensional numerical manufacturing data. Combining both normalization and PCA can significantly enhance the learning efficiency of high-dimensional data.
After that, we summarize several practical guidelines for manufacturing data preparation for ML, which provide a valuable basis for future manufacturing data analysis with ML approaches.