TY - JOUR
T1 - Big data mining and classification of intelligent material science data using machine learning
AU - Chittam, Swetha
AU - Gokaraju, Balakrishna
AU - Xu, Zhigang
AU - Sankar, Jagannathan
AU - Roy, Kaushik
N1 - Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2021/9
Y1 - 2021/9
N2 - There is a high need for a big data repository for material compositions and their derived analytics of metal strength, in the material science community. Currently, many researchers maintain their own excel sheets, prepared manually by their team by tabulating the experimental data collected from scientific journals, and analyzing the data by performing manual calculations using formulas to determine the strength of the material. In this study, we propose a big data storage for material science data and its processing parameters information to address the laborious process of data tabulation from scientific articles, data mining techniques to retrieve the information from databases to perform big data analytics, and a machine learning prediction model to determine material strength insights. Three models are proposed based on Logistic regression, Support vector Machine SVM and Random Forest Algorithms. These models are trained and tested using a 10‐fold cross validation approach. The Random Forest classification model performed better on the inde-pendent dataset, with 87% accuracy in comparison to Logistic regression and SVM with 72% and 78%, respectively.
AB - There is a high need for a big data repository for material compositions and their derived analytics of metal strength, in the material science community. Currently, many researchers maintain their own excel sheets, prepared manually by their team by tabulating the experimental data collected from scientific journals, and analyzing the data by performing manual calculations using formulas to determine the strength of the material. In this study, we propose a big data storage for material science data and its processing parameters information to address the laborious process of data tabulation from scientific articles, data mining techniques to retrieve the information from databases to perform big data analytics, and a machine learning prediction model to determine material strength insights. Three models are proposed based on Logistic regression, Support vector Machine SVM and Random Forest Algorithms. These models are trained and tested using a 10‐fold cross validation approach. The Random Forest classification model performed better on the inde-pendent dataset, with 87% accuracy in comparison to Logistic regression and SVM with 72% and 78%, respectively.
KW - Classification algorithms
KW - Data mining
KW - Logistic regression
KW - Mongodb
KW - No‐SQL database
KW - Random forest
KW - Support vector machine SVM
UR - https://www.scopus.com/pages/publications/85115210388
U2 - 10.3390/app11188596
DO - 10.3390/app11188596
M3 - Article
SN - 2076-3417
VL - 11
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 18
M1 - 8596
ER -