Skip to main content
Erschienen in: Annals of Data Science 3/2024

25.04.2024

Predicting the Functional Changes in Protein Mutations Through the Application of BiLSTM and the Self-Attention Mechanism

verfasst von: Zixuan Fan, Yan Xu

Erschienen in: Annals of Data Science | Ausgabe 3/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the field of bioinformatics, changes in protein functionality are mainly influenced by protein mutations. Accurately predicting these functional changes can enhance our understanding of evolutionary mechanisms, promote developments in protein engineering-related fields, and accelerate progress in medical research. In this study, we introduced two different models: one based on bidirectional long short-term memory (BiLSTM), and the other based on self-attention. These models were integrated using a weighted fusion method to predict protein functional changes associated with mutation sites. The findings indicate that the model's predictive precision matches that of the current model, along with its capacity for generalization. Furthermore, the ensemble model surpasses the performance of the single models, highlighting the value of utilizing their synergistic capabilities. This finding may improve the accuracy of predicting protein functional changes associated with mutations and has potential applications in protein engineering and drug research. We evaluated the efficacy of our models under different scenarios by comparing the predicted results of protein functional changes across various numbers of mutation sites. As the number of mutation sites increases, the prediction accuracy decreases significantly, highlighting the inherent limitations of these models in handling cases involving more mutation sites.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ding F, Gao F, Zhang S et al (2021) A review of the mechanism of DDIT4 serve as a mitochondrial related protein in tumor regulation. Sci Prog 104(1):0036850421997273CrossRef Ding F, Gao F, Zhang S et al (2021) A review of the mechanism of DDIT4 serve as a mitochondrial related protein in tumor regulation. Sci Prog 104(1):0036850421997273CrossRef
2.
Zurück zum Zitat Mehta NK, Li B, Rakhra K et al (2022) CLN-617 is an IL-2/IL-12 fusion protein with a collagen-anchoring domain that induces potent systemic anti-tumor immunity upon intra-tumoral administration. Cancer Res 82(12_Supplement):3505–3505CrossRef Mehta NK, Li B, Rakhra K et al (2022) CLN-617 is an IL-2/IL-12 fusion protein with a collagen-anchoring domain that induces potent systemic anti-tumor immunity upon intra-tumoral administration. Cancer Res 82(12_Supplement):3505–3505CrossRef
3.
Zurück zum Zitat Kammala A, Benson M, Ganguly E et al (2022) Fetal membranes contribute to drug transport across the feto-maternal interface utilizing the breast cancer resistance protein (BCRP). Life 12(2):166CrossRef Kammala A, Benson M, Ganguly E et al (2022) Fetal membranes contribute to drug transport across the feto-maternal interface utilizing the breast cancer resistance protein (BCRP). Life 12(2):166CrossRef
4.
Zurück zum Zitat Crick FH (1958) On protein synthesis. Symp Soc Exp Biol 12(138–163):8 Crick FH (1958) On protein synthesis. Symp Soc Exp Biol 12(138–163):8
5.
Zurück zum Zitat Nei M (2013) Mutation-driven evolution. Oxford University Press, Oxford Nei M (2013) Mutation-driven evolution. Oxford University Press, Oxford
6.
Zurück zum Zitat Hershberg R (2015) Mutation—the engine of evolution: studying mutation and its role in the evolution of bacteria. Cold Spring Harb Perspect Biol 7(9):a018077CrossRef Hershberg R (2015) Mutation—the engine of evolution: studying mutation and its role in the evolution of bacteria. Cold Spring Harb Perspect Biol 7(9):a018077CrossRef
7.
Zurück zum Zitat Oetting WS, King RA (1999) Molecular basis of albinism: mutations and polymorphisms of pigmentation genes associated with albinism. Hum Mutat 13(2):99–115CrossRef Oetting WS, King RA (1999) Molecular basis of albinism: mutations and polymorphisms of pigmentation genes associated with albinism. Hum Mutat 13(2):99–115CrossRef
8.
Zurück zum Zitat Webster MK, Donoghue DJ (1996) Constitutive activation of fibroblast growth factor receptor 3 by the transmembrane domain point mutation found in achondroplasia. EMBO J 15(3):520–527CrossRef Webster MK, Donoghue DJ (1996) Constitutive activation of fibroblast growth factor receptor 3 by the transmembrane domain point mutation found in achondroplasia. EMBO J 15(3):520–527CrossRef
9.
Zurück zum Zitat Shi Y (2022) Advances in big data analytics: theory, algorithm and practice. Springer, SingaporeCrossRef Shi Y (2022) Advances in big data analytics: theory, algorithm and practice. Springer, SingaporeCrossRef
10.
Zurück zum Zitat Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill, New York Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill, New York
11.
Zurück zum Zitat Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer, BerlinCrossRef Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer, BerlinCrossRef
12.
Zurück zum Zitat Li Y (2022) Research and application of deep learning in image recognition. In: 2022 IEEE 2nd international conference on power, electronics and computer applications (ICPECA), pp 994–999 Li Y (2022) Research and application of deep learning in image recognition. In: 2022 IEEE 2nd international conference on power, electronics and computer applications (ICPECA), pp 994–999
13.
Zurück zum Zitat Salem H, Negm KR, Shams MY et al (2022) Recognition of ocular disease based optimized VGG-Net models. Springer, ChamCrossRef Salem H, Negm KR, Shams MY et al (2022) Recognition of ocular disease based optimized VGG-Net models. Springer, ChamCrossRef
14.
Zurück zum Zitat Nagarhalli TP, Vaze V, Rana N (2021) Impact of machine learning in natural language processing: a review. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV), pp 1529–1534 Nagarhalli TP, Vaze V, Rana N (2021) Impact of machine learning in natural language processing: a review. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV), pp 1529–1534
15.
Zurück zum Zitat Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64CrossRef Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64CrossRef
16.
Zurück zum Zitat Senior AW, Evans R, Jumper J et al (2019) Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins Struct Funct Bioinform 87(12):1141–1148CrossRef Senior AW, Evans R, Jumper J et al (2019) Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins Struct Funct Bioinform 87(12):1141–1148CrossRef
17.
Zurück zum Zitat Ching T, Himmelstein DS, Beaulieu-Jones BK et al (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15(141):20170387CrossRef Ching T, Himmelstein DS, Beaulieu-Jones BK et al (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15(141):20170387CrossRef
18.
Zurück zum Zitat Jumper J, Evans R, Pritzel A et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873):583–589CrossRef Jumper J, Evans R, Pritzel A et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873):583–589CrossRef
19.
Zurück zum Zitat Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef
20.
Zurück zum Zitat Vaser R, Adusumalli S, Leng SN et al (2016) SIFT missense predictions for genomes. Nat Protoc 11(1):1–9CrossRef Vaser R, Adusumalli S, Leng SN et al (2016) SIFT missense predictions for genomes. Nat Protoc 11(1):1–9CrossRef
21.
Zurück zum Zitat Adzhubei IA, Schmidt S, Peshkin L et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248–249CrossRef Adzhubei IA, Schmidt S, Peshkin L et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248–249CrossRef
22.
Zurück zum Zitat Romero PA, Krause A, Arnold FH (2013) Navigating the protein fitness landscape with Gaussian processes. Proc Natl Acad Sci 110(3):E193–E201CrossRef Romero PA, Krause A, Arnold FH (2013) Navigating the protein fitness landscape with Gaussian processes. Proc Natl Acad Sci 110(3):E193–E201CrossRef
23.
Zurück zum Zitat Gray VE, Hause RJ, Luebeck J et al (2018) Quantitative missense variant effect prediction using large-scale mutagenesis data. Cell Syst 6(1):116-124. e3CrossRef Gray VE, Hause RJ, Luebeck J et al (2018) Quantitative missense variant effect prediction using large-scale mutagenesis data. Cell Syst 6(1):116-124. e3CrossRef
24.
Zurück zum Zitat Xu Y, Verma D, Sheridan RP et al (2020) Deep dive into machine learning models for protein engineering. J Chem Inf Model 60(6):2773–2790CrossRef Xu Y, Verma D, Sheridan RP et al (2020) Deep dive into machine learning models for protein engineering. J Chem Inf Model 60(6):2773–2790CrossRef
25.
Zurück zum Zitat Gelman S, Fahlberg SA, Heinzelman P et al (2021) Neural networks to learn protein sequence–function relationships from deep mutational scanning data. Proc Natl Acad Sci 118(48):e2104878118CrossRef Gelman S, Fahlberg SA, Heinzelman P et al (2021) Neural networks to learn protein sequence–function relationships from deep mutational scanning data. Proc Natl Acad Sci 118(48):e2104878118CrossRef
26.
Zurück zum Zitat Alford RF, Leaver-Fay A, Jeliazkov JR et al (2017) The Rosetta all-atom energy function for macromolecular modeling and design. J Chem Theory Comput 13(6):3031–3048CrossRef Alford RF, Leaver-Fay A, Jeliazkov JR et al (2017) The Rosetta all-atom energy function for macromolecular modeling and design. J Chem Theory Comput 13(6):3031–3048CrossRef
27.
Zurück zum Zitat Hopf TA, Ingraham JB, Poelwijk FJ et al (2017) Mutation effects predicted from sequence co-variation. Nat Biotechnol 35(2):128–135CrossRef Hopf TA, Ingraham JB, Poelwijk FJ et al (2017) Mutation effects predicted from sequence co-variation. Nat Biotechnol 35(2):128–135CrossRef
28.
Zurück zum Zitat Riesselman AJ, Ingraham JB, Marks DS (2018) Deep generative models of genetic variation capture the effects of mutations. Nat Methods 15(10):816–822CrossRef Riesselman AJ, Ingraham JB, Marks DS (2018) Deep generative models of genetic variation capture the effects of mutations. Nat Methods 15(10):816–822CrossRef
29.
Zurück zum Zitat Sarkisyan KS, Bolotin DA, Meer MV et al (2016) Local fitness landscape of the green fluorescent protein. Nature 533(7603):397–401CrossRef Sarkisyan KS, Bolotin DA, Meer MV et al (2016) Local fitness landscape of the green fluorescent protein. Nature 533(7603):397–401CrossRef
30.
Zurück zum Zitat Romero PA, Tran TM, Abate AR (2015) Dissecting enzyme function with microfluidic-based deep mutational scanning. Proc Natl Acad Sci 112(23):7159–7164CrossRef Romero PA, Tran TM, Abate AR (2015) Dissecting enzyme function with microfluidic-based deep mutational scanning. Proc Natl Acad Sci 112(23):7159–7164CrossRef
31.
Zurück zum Zitat Rubin AF, Lucas N, Bajjalieh SM, et al (2016) Enrich2: a statistical framework for analyzing deep mutational scanning data. bioRxiv 075150 Rubin AF, Lucas N, Bajjalieh SM, et al (2016) Enrich2: a statistical framework for analyzing deep mutational scanning data. bioRxiv 075150
32.
Zurück zum Zitat Melamed D, Young DL, Gamble CE et al (2013) Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly (A)-binding protein. RNA 19(11):1537–1551CrossRef Melamed D, Young DL, Gamble CE et al (2013) Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly (A)-binding protein. RNA 19(11):1537–1551CrossRef
33.
Zurück zum Zitat Starita LM, Pruneda JN, Lo RS et al (2013) Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis. Proc Natl Acad Sci 110(14):E1263–E1272CrossRef Starita LM, Pruneda JN, Lo RS et al (2013) Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis. Proc Natl Acad Sci 110(14):E1263–E1272CrossRef
34.
Zurück zum Zitat Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28(1):374–374CrossRef Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28(1):374–374CrossRef
35.
Zurück zum Zitat Hu S, Ma R, Wang H (2019) An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences. PLoS ONE 14(11):e0225317CrossRef Hu S, Ma R, Wang H (2019) An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences. PLoS ONE 14(11):e0225317CrossRef
36.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30 Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30
37.
Zurück zum Zitat Zhang Y, Zhang R, Ma Q et al (2020) A feature selection and multi-model fusion-based approach of predicting air quality. ISA Trans 100:210–220CrossRef Zhang Y, Zhang R, Ma Q et al (2020) A feature selection and multi-model fusion-based approach of predicting air quality. ISA Trans 100:210–220CrossRef
38.
Zurück zum Zitat Pavlyshenko B (2018) Using stacking approaches for machine learning models. In: 2018 IEEE second international conference on data stream mining & processing (DSMP), pp 255–258 Pavlyshenko B (2018) Using stacking approaches for machine learning models. In: 2018 IEEE second international conference on data stream mining & processing (DSMP), pp 255–258
39.
Zurück zum Zitat Tang Q, Nie F, Kang J et al (2021) mRNALocater: enhance the prediction accuracy of eukaryotic mRNA subcellular localization by using model fusion strategy. Mol Ther 29(8):2617–2623CrossRef Tang Q, Nie F, Kang J et al (2021) mRNALocater: enhance the prediction accuracy of eukaryotic mRNA subcellular localization by using model fusion strategy. Mol Ther 29(8):2617–2623CrossRef
40.
Zurück zum Zitat Rollins NJ, Brock KP, Poelwijk FJ et al (2019) Inferring protein 3D structure from deep mutation scans. Nat Genet 51(7):1170–1176CrossRef Rollins NJ, Brock KP, Poelwijk FJ et al (2019) Inferring protein 3D structure from deep mutation scans. Nat Genet 51(7):1170–1176CrossRef
41.
Zurück zum Zitat Bolognesi B, Faure AJ, Seuma M et al (2019) The mutational landscape of a prion-like domain. Nat Commun 10(1):4162CrossRef Bolognesi B, Faure AJ, Seuma M et al (2019) The mutational landscape of a prion-like domain. Nat Commun 10(1):4162CrossRef
42.
Zurück zum Zitat Araya CL, Fowler DM, Chen W et al (2012) A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc Natl Acad Sci 109(42):16858–16863CrossRef Araya CL, Fowler DM, Chen W et al (2012) A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc Natl Acad Sci 109(42):16858–16863CrossRef
43.
Zurück zum Zitat Luo Y, Jiang G, Yu T et al (2021) ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat Commun 12(1):5743CrossRef Luo Y, Jiang G, Yu T et al (2021) ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat Commun 12(1):5743CrossRef
44.
Zurück zum Zitat Li M, Kang L, Xiong Y et al (2023) SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering. J Cheminform 15(1):1–13CrossRef Li M, Kang L, Xiong Y et al (2023) SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering. J Cheminform 15(1):1–13CrossRef
Metadaten
Titel
Predicting the Functional Changes in Protein Mutations Through the Application of BiLSTM and the Self-Attention Mechanism
verfasst von
Zixuan Fan
Yan Xu
Publikationsdatum
25.04.2024
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 3/2024
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-024-00530-7

Weitere Artikel der Ausgabe 3/2024

Annals of Data Science 3/2024 Zur Ausgabe

Premium Partner