TY - JOUR
T1 - A dataset of deep learning performance from cross-base data encoding on MNIST and MNIST-C
AU - McKnight, Lawrence
AU - Jaiswal, Chandra
AU - AlHmoud, Issa
AU - Gokaraju, Balakrishna
PY - 2024
Y1 - 2024
N2 - Effective data representation in machine learning and deep learning is paramount. For an algorithm or neural network to capture patterns in data and be able to make reliable predictions, the data must appropriately describe the problem domain. Although there exists much literature on data preprocessing for machine learning and data science applications, novel data representation methods for enhancing machine learning model performance remain highly absent within the literature. This dataset is a compilation of convolutional neural network model performance trained and tested on a wide range of numerical base representations of the MNIST and MNIST-C datasets. This performance data can be further analysed by the research community to uncover trends in model performance against the numerical base of its data. This dataset can be used to produce more research of the same nature, testing cross-base data encoding on machine learning training and testing data for a wide range of real-world applications.
AB - Effective data representation in machine learning and deep learning is paramount. For an algorithm or neural network to capture patterns in data and be able to make reliable predictions, the data must appropriately describe the problem domain. Although there exists much literature on data preprocessing for machine learning and data science applications, novel data representation methods for enhancing machine learning model performance remain highly absent within the literature. This dataset is a compilation of convolutional neural network model performance trained and tested on a wide range of numerical base representations of the MNIST and MNIST-C datasets. This performance data can be further analysed by the research community to uncover trends in model performance against the numerical base of its data. This dataset can be used to produce more research of the same nature, testing cross-base data encoding on machine learning training and testing data for a wide range of real-world applications.
UR - https://dx.doi.org/10.1016/j.dib.2024.111194
U2 - 10.1016/j.dib.2024.111194
DO - 10.1016/j.dib.2024.111194
M3 - Article
VL - 57
JO - Data in brief
JF - Data in brief
IS - Issue
ER -