Protein fold classification with genetic algorithms and feature selection

  • Peng Chen
  • , Chunmei Liu
  • , Legand Burge
  • , Mohammad Mahmood
  • , William Southerland
  • , Clay S Gloster

Research output: Contribution to journalArticlepeer-review

Abstract

Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors. © 2009 Imperial College Press.
Original languageEnglish
Pages (from-to)773-788
Number of pages16
JournalJournal of Bioinformatics and Computational Biology
Volume7
Issue number5
DOIs
StatePublished - Oct 15 2009

Keywords

  • Feature selection
  • Fold classification
  • Genetic algorithms
  • Support vector machine

Fingerprint

Dive into the research topics of 'Protein fold classification with genetic algorithms and feature selection'. Together they form a unique fingerprint.

Cite this