TY - JOUR
T1 - Protein fold classification with genetic algorithms and feature selection
AU - Chen, Peng
AU - Liu, Chunmei
AU - Burge, Legand
AU - Mahmood, Mohammad
AU - Southerland, William
AU - Gloster, Clay S
PY - 2009/10/15
Y1 - 2009/10/15
N2 - Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors. © 2009 Imperial College Press.
AB - Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors. © 2009 Imperial College Press.
KW - Feature selection
KW - Fold classification
KW - Genetic algorithms
KW - Support vector machine
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=70349830371&origin=inward
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=70349830371&origin=inward
U2 - 10.1142/S0219720009004321
DO - 10.1142/S0219720009004321
M3 - Article
C2 - 19785045
SN - 0219-7200
VL - 7
SP - 773
EP - 788
JO - Journal of Bioinformatics and Computational Biology
JF - Journal of Bioinformatics and Computational Biology
IS - 5
ER -