Skip to main navigation Skip to search Skip to main content

Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins

  • Pawel Pratyush
  • , Suresh Pokharel
  • , Stefan Schulze
  • , Lisa Bramer
  • , Robert H Newman
  • , Dukka B. KC
  • Golisano College of Computing and Information Sciences
  • Rochester Institute of Technology
  • Pacific Northwest National Laboratory
  • Department of Computer Science

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Post-translational modifications (PTMs) are vital regulators of protein function, influencing a myriad of cellular processes and disease mechanisms. Traditional experimental methods for PTM identification are both costly and labor-intensive, underlining the pressing need for efficient computational approaches. Early computational strategies predominantly relied on primary amino acid sequences and handcrafted features, which often lacked the contextual and structural understanding necessary for precise PTM site prediction. The emergence of transformer-based large language models (LLMs), particularly protein language models (pLMs), has revolutionized PTM prediction by producing context-aware embeddings that capture functional and structural intra-sequence dependencies. In this chapter, we provide a comprehensive review of recent advancements in leveraging LLMs (or, pLMs) for PTM site prediction, an important residue-level task in protein research. We identify emerging trends in the field, including the application of fine-tuning techniques, the integration of embeddings from multiple pLMs, and the incorporation of multiple modalities such as codon-aware embeddings, 3D structural data, and conventional representations. Additionally, we discuss tools that employ graph-based representations, the mamba architecture, and contrastive learning paradigms to further refine pLM-powered PTM site prediction models. We finally explore the interpretability and explainability aspects of the embeddings used in various tools. Despite the significant progress made, persistent limitations remain, and we outline these challenges while proposing directions for future research.
Original languageEnglish
Pages (from-to)313-355
Number of pages43
JournalMethods in Molecular Biology
Volume2941
DOIs
StatePublished - Jan 1 2025

Keywords

  • AlphaFold
  • Contrastive learning
  • Explainability
  • Fine-tuning
  • GPT
  • Graph
  • Large language model
  • Mamba
  • Post-translational modification
  • Protein language model

Fingerprint

Dive into the research topics of 'Large Language Model (LLM)-Based Advances in Prediction of Post-translational Modification Sites in Proteins'. Together they form a unique fingerprint.

Cite this