Abdallah, H., Nguyen, D., Nguyen, K., & Mansour, E. (2021). Demonstration of kgnet: a cogni- tive knowledge graph platform. In O. Seneviratne, C. Pesquita, J. Sequeda, & L. Etcheverry (Eds.), Proceedings of the ISWC 2021 posters, demos and industry tracks: From novel ideas to industrial practice co-located with 20th international semantic web conference (ISWC 2021), virtual conference, october 24-28, 2021 (Vol. 2980). CEUR-WS.org. Retrieved from https://ceur-ws.org/Vol-2980/paper311.pdf
Alghushairy, O., Alsini, R., Soule, T., & Ma, X. (2021). A review of local outlier factor algorithms for outlier detection in big data streams. Big Data and Cognitive Computing, 5(1). Retrieved from https://www.mdpi.com/2504-2289/5/1/1 doi: 10.3390/bdcc5010001
Al-Rfou, R., Alain, G., Almahairi, A., Angermueller, C., Bahdanau, D., Ballas, N., ... others (2016). Theano: A python framework for fast computation of mathematical expressions. arXiv e-prints, arXiv±1605.
Alteryx. (2023). Retrieved from https://www.alteryx.com/
Bauckmann, J., Leser, U., Naumann, F., & Tietz, V. (2007). Efficiently detecting inclusion dependencies. In 2007 ieee 23rd international conference on data engineering (p. 1448-1450). doi:10.1109/ICDE.2007.369032
Bellamy, R. K. E., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., . . . Zhang, Y. (2018, October). AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. Retrieved from https://arxiv.org/abs/1810.01943 Biessmann, F., Rukat, T., Schmidt, P., & et al. (2019). Datawig: Missing value imputation for tables. J. Mach. Learn. Res., 20(175), 1±6.
Bogatu, A., Fernandes, A. A. A., Paton, N. W., & Konstantinou, N. (2020). Dataset discovery in data lakes. In 2020 ieee 36th international conference on data engineering (icde) (p. 709-720). doi: 10.1109/ICDE48307.2020.00067
Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., . . . Zhang, Z. (2015). Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274.
Dask Development Team. (2016). Dask: Library for dynamic task scheduling [Computer software manual]. Retrieved from https://dask.org
Dua, D., & Graff, C. (2017). UCI machine learning repository.
Einblick. (2023). Retrieved from https://www.einblick.ai/
Feast: Feature Store for Machine Learning. (2022). Retrieved from https://feast.dev/ Fernandez, R. C., Abedjan, Z., Koko, F., Yuan, G., Madden, S., & Stonebraker, M. (2018). Aurum:
A data discovery system. In 34th IEEE international conference on data engineering, ICDE 2018, paris, france, april 16-19, 2018 (pp. 1001±1012). IEEE Computer Society. Retrieved from https://doi.org/10.1109/ICDE.2018.00094 doi: 10.1109/ICDE.2018 .00094
Goikoetxea, J., Agirre, E., & Soroa, A. (2016). Single or multiple? combining word representations independently learned from text and wordnet. In Proceedings of the thirtieth conference on artificial intelligence (AAAI) (pp. 2608±2614). Retrieved from http://www.aaai.org/ ocs/index.php/AAAI/AAAI16/paper/view/11777
Hai, R., Kang, Y., Koutras, C., Ionescu, A., & Katsifodimos, A. (2022). Bridging the gap between data integration and ml systems. arXiv preprint arXiv:2205.09681.
Helal, A., Helali, M., Ammar, K., & Mansour, E. (2021). A demonstration of kglac: A data discovery and enrichment platform for data science. Proceedings of the VLDB Endowment, 14(12), 2675±2678.
Helali, M., Mansour, E., Abdelaziz, I., & et al. (2022). A scalable AutoML approach based on graph neural networks. PVLDB, 15(11).
Helali, M., Vashisth, S., Carrier, P., & et al. (2021). Linked data science powered by knowledge graphs. CoRR, abs/2303.02204.
Kakantousis, T., Kouzoupis, A., Buso, F., & et al. (2019). Horizontally scalable ml pipelines with a feature store. In Sysml.
Kanter, J. M., & Veeramachaneni, K. (2015). Deep feature synthesis: Towards automating data science endeavors. In 2015 ieee international conference on data science and advanced analytics (dsaa) (p. 1-10). doi: 10.1109/DSAA.2015.7344858
Kasneci,E.,Sessler,K.,KuÈchemann,S.,Bannert,M.,Dementieva,D.,Fischer,F.,...Kasneci,G. (2023). Chatgpt for good? on opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. Retrieved from https://www .sciencedirect.com/science/article/pii/S1041608023000195 doi:https://doi.org/10.1016/j.lindif.2023.102274
Katz, G., Shin, E. C. R., & Song, D. (2016). Explorekit: Automatic feature generation and selection. In 2016 ieee 16th international conference on data mining (icdm) (p. 979-984). doi: 10.1109/ ICDM.2016.0123
Kaul, A., Maheshwary, S., & Pudi, V. (2017). Autolearn - automated feature generation and selection. In Icdm (pp. 217±226).
Khatiwada, A., Fan, G., Shraga, R., Chen, Z., Gatterbauer, W., Miller, R. J., & Riedewald, M. (2023, may). Santos: Relationship-based semantic table union search. Proc. ACM Manag. Data, 1(1). Retrieved from https://doi.org/10.1145/3588689 doi: 10.1145/3588689
Kumar, V., & Minz, S. (2014). Feature selection: a literature review. SmartCR, 4(3), 211±229. Lam, H. T., Thiebaut, J.-M., Sinn, M., Chen, B., Mai, T., & Alkan, O. (2017). One button machine for automating feature engineering in relational databases. arXiv preprint arXiv:1706.00327. Mansour, E., Srinivas, K., & Hose, K. (2021). Federated data science to break down silos [vision].
SIGMOD Rec., 50(4).
Mueller, J., & Smola, A. (2019). Recognizing variables from their data via deep embeddings of distributions. In International conference on data mining (ICDM) (pp. 1264±1269). Nargesian, F., Asudeh, A., & Jagadish, H. V. (2021, jul). Tailoring data source distribu- tions for fairness-aware data integration. Proc. VLDB Endow., 14(11), 2519±2532. Re- trieved from https://doi.org/10.14778/3476249.3476299 doi:10.14778/3476249.3476299
Nargesian, F., Samulowitz, H., Khurana, U., & et al. (2017). Learning feature engineering for classification. In Ijcai.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., . . . Lerer, A. (2017). Auto- matic differentiation in pytorch.
Peng, J., Wu, W., Lockhart, B., & et al. (2021). Dataprep.eda: Task-centric exploratory data analysis for statistical modeling in python. In Sigmod (pp. 2271±2280).
Raju, V. G., Lakshmi, K. P., Jain, V. M., Kalidindi, A., & Padma, V. (2020). Study the influence of normalization/transformation process on the accuracy of supervised classification. In 2020 third international conference on smart systems and inventive technology (icssit) (pp. 729± 735).
Raju, V. N. G., Lakshmi, K. P., Jain, V. M., Kalidindi, A., & Padma, V. (2020). Study the influence of normalization/transformation process on the accuracy of supervised classification. In Icssit (p. 729-735).
Rekatsinas, T., Chu, X., Ilyas, I. F., & et al. (2017). Holoclean: Holistic data repairs with proba- bilistic inference. PVLDB, 10(11).
Rezig, E. K., Bhandari, A., Fariha, A., & et al. (2021). DICE: data discovery by example. PVLDB, 14(12).
Rostin, A., Albrecht, O., Bauckmann, J., Naumann, F., & Leser, U. (2009). A machine learning approach to foreign key discovery. In 12th international workshop on the web and databases, webdb 2009, providence, rhode island, usa, june 28, 2009.
Samala, R. K., Chan, H.-P., Hadjiiski, L., & Koneru, S. (2020). Hazards of data leakage in machine learning: a study on classification of breast cancer using deep neural networks. In Medical imaging 2020: Computer-aided diagnosis (Vol. 11314, pp. 279±284).
Tensorflow: Large-scale machine learning on heterogeneous systems. (n.d.). Retrieved from https://www.tensorflow.org/ (Software available from tensorflow.org)
Trifacta. (2023). Retrieved from https://www.trifacta.com/
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, 45(3), 1±67. Retrieved from https://www .jstatsoft.org/index.php/jss/article/view/v045i03 doi: 10.18637/jss.v045.i03
Waring, J., Lindvall, C., & Umeton, R. (2020). Automated machine learning: Review of the state-of-the-art and opportunities for healthcare. Artificial Intelligence in Medicine, 104, 101822. Retrieved from https://www.sciencedirect.com/science/article/pii/ S0933365719310437 doi: https://doi.org/10.1016/j.artmed.2020.101822
Xu, S., Lu, B., Baldea, M., Edgar, T. F., Wojsznis, W., Blevins, T., & Nixon, M. (2015). Data cleaning in the process industries. Reviews in Chemical Engineering, 31(5), 453±490. Re- trieved 2023-07-11, from https://doi.org/10.1515/revce-2015-0022 doi: doi:10.1515/revce-2015-0022
Yan, C., & He, Y. (2020). Auto-Suggest: Learning-to-recommend data preparation steps using data science notebooks. In SIGMOD (pp. 1539±1554).
Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., . . . Sun, M. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57-81. Retrieved from https://www.sciencedirect.com/science/article/pii/S2666651021000012 doi: https://doi.org/10.1016/j.aiopen.2021.01.001