TY - JOUR
T1 - FEPS: A Tool for Feature Extraction from Protein Sequence
AU - Ismail, Hamid
AU - White, Clarence
AU - AL-Barakati, Hussam
AU - Newman, Robert H
AU - Kc, Dukka B.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Machine learning has become one of the most popular choices for developing computational approaches in protein structural bioinformatics. The ability to extract features from protein sequence/structure often becomes one of the crucial steps for the development of machine learning-based approaches. Over the years, various sequence, structural, and physicochemical descriptors have been developed for proteins and these descriptors have been used to predict/solve various bioinformatics problems. Hence, several feature extraction tools have been developed over the years to help researchers to generate numeric features from protein sequences. Most of these tools have some limitations regarding the number of sequences they can handle and the subsequent preprocessing that is required for the generated features before they can be fed to machine learning methods. Here, we present Feature Extraction from Protein Sequences (FEPS), a toolkit for feature extraction. FEPS is a versatile software package for generating various descriptors from protein sequences and can handle several sequences: the number of which is limited only by the computational resources. In addition, the features extracted from FEPS do not require subsequent processing and are ready to be fed to the machine learning techniques as it provides various output formats as well as the ability to concatenate these generated features. FEPS is made freely available via an online web server as well as a stand-alone toolkit. FEPS, a comprehensive toolkit for feature extraction, will help spur the development of machine learning-based models for various bioinformatics problems.
AB - Machine learning has become one of the most popular choices for developing computational approaches in protein structural bioinformatics. The ability to extract features from protein sequence/structure often becomes one of the crucial steps for the development of machine learning-based approaches. Over the years, various sequence, structural, and physicochemical descriptors have been developed for proteins and these descriptors have been used to predict/solve various bioinformatics problems. Hence, several feature extraction tools have been developed over the years to help researchers to generate numeric features from protein sequences. Most of these tools have some limitations regarding the number of sequences they can handle and the subsequent preprocessing that is required for the generated features before they can be fed to machine learning methods. Here, we present Feature Extraction from Protein Sequences (FEPS), a toolkit for feature extraction. FEPS is a versatile software package for generating various descriptors from protein sequences and can handle several sequences: the number of which is limited only by the computational resources. In addition, the features extracted from FEPS do not require subsequent processing and are ready to be fed to the machine learning techniques as it provides various output formats as well as the ability to concatenate these generated features. FEPS is made freely available via an online web server as well as a stand-alone toolkit. FEPS, a comprehensive toolkit for feature extraction, will help spur the development of machine learning-based models for various bioinformatics problems.
KW - Feature extraction
KW - Machine learning
KW - Posttranslational modifications
KW - Protein descriptors
KW - Sequence-based features
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85132869111&origin=inward
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85132869111&origin=inward
U2 - 10.1007/978-1-0716-2317-6_3
DO - 10.1007/978-1-0716-2317-6_3
M3 - Article
C2 - 35696075
SN - 1064-3745
VL - 2499
SP - 65
EP - 104
JO - Methods in Molecular Biology
JF - Methods in Molecular Biology
ER -