The encoded polypeptides and small proteins that exist in the symbiotic microbiome of humans, animals, and plants are considered to be a huge class of “dark matter” in microorganisms, and the functional diversity it contains has a very large imagination space.
For example, antimicrobial peptides are such a “dark matter”. Previous studies have concluded that antimicrobial peptides can be used as very potential drugs or precursor molecules for the treatment of drug-resistant bacteria, and it is not easy to develop strong drug resistance, which is helpful to deal with the growing problem of drug-resistant bacterial infections.
Therefore, it is of great significance to mine and study the massive peptides in the symbiotic microbiome.
Recently, a team from the Institute of Microbiology, Chinese Academy of Sciences combined a variety of natural language processing neural network models such as LSTM, Attention, and BERT to build a unified pipeline for identifying candidate adenosine monophosphate from human gut microbiome data. Of the 2349 peptide sequences identified as candidate adenosine monophosphate, 216 were chemically synthesized, of which 181 showed antibacterial activity; The sequence homology is less than 40%.
The related paper was published in Nature Biotechnology under the title “Identification of Antimicrobial Peptides from Human Intestinal Microbiota Using Deep Learning Method”, and Wang Jun, a researcher and doctoral supervisor of the Institute of Microbiology, Chinese Academy of Sciences, served as the last corresponding author.
Reviewers commented on the study, “From computational predictions to animal models with very good results, this study summarizes an impressive array of work, including some candidate peptides for further study. Using machine learning to discover new After adenosine-phosphate, it is very interesting to conduct detailed microbiological verification of its efficacy, which may have a positive impact on the field.”
Molecules that function in microorganisms and other organisms include not only various metabolic pathways and pathways The resulting small molecules, as well as a series of biological macromolecules. Some of these macromolecules are the products of biochemical reactions, such as peptidoglycan on the bacterial cell wall and lipopolysaccharide on the surface; some are directly encoded in the genome, including polypeptides and small RNAs.
Taking antimicrobial peptides as an example, there are thousands of antimicrobial peptides known in nature, from a wide range of sources, ranging from the most primitive bacteria to higher organisms. These peptides are components of innate immunity in humans and amphibians, can be used to compete with each other and maintain community structure in bacteria, and have functions such as anti-cancer, immune regulation, and metabolic improvement.
However, for these diverse biological macromolecules with low sequence similarity and complex functional types, there is currently no method that can directly link their sequences and functions.
Because the macromolecular sequences are relatively short and the overall similarity is very low, traditional methods have great difficulties in mining based on sequence similarity.
Wang Jun said, “It is the core starting point of our research to make more accurate and efficient identification of these peptide sequences with very short and low similarity.”
Schematic diagram of the team’s research workflow
It is understood that Wang Jun’s team applied the latest methods of natural language analysis in the field of artificial intelligence to study the genome sequence, especially the functional prediction of the small proteins encoded therein. Based on the existing thousands of known antimicrobial peptides, an analysis process integrating multiple neural network models has been constructed, and the determination accuracy rate of more than 90% has been achieved.
Next, they took advantage of the vast amount of data that has now accumulated on the healthy human microbiome, the enormous coding potential of which implies that there are many types of antimicrobial and other peptides, and that these peptides may compete with each other and interact with the host plays a very important role.
The team believes that polypeptides expressed in the gut should have a better safety profile for eukaryotic cells. To this end, they screened layers of data in more than 10,000 microbiomes to gradually reduce false positives, and finally concluded that among the more than 200 peptides synthesized, more than 180 peptides have very clear antibacterial abilities, thus The reliability of its method is verified.
In addition, the study also shows that in large-scale genomic and metagenomic data, artificial intelligence can be used to directly mine and determine the functional molecules of specific groups. After high-throughput screening and verification, follow-up mechanisms, effectiveness and In vivo studies.
This research method, dubbed “from hard disk to drug” by Wang Jun, can greatly improve the research speed and yield of promising drugs.
Wang Jun said that the original idea of the study came from the cooperation with the clinic. Through multiple previous clinical collaborations, Jun Wang’s team gradually realized that the molecules related to disease and health in the intestinal flora are not limited to small molecules that are often studied, and there are a series of protein substances that can also interact with the host and It plays a role in regulating immunity and metabolism.
Mining candidate adenosine monophosphate from metagenomic data
For example, bacterial polypeptides, which can mimic the sequence of human self-proteins, form a “mimetic epitope” antigen that can induce a significant inflammatory response and bind to autoimmune antibodies. That is to say, macromolecules such as polypeptides directly encoded by the microbial genome can also act as functional molecules to play a pathogenic or therapeutic role.
The team believes that although it is impossible to effectively infer which small molecules produced by metabolism from a large amount of metagenomic data at this stage, in fact, many specific functional proteins encoded by open reading frames can be directly mined. The question is how to mine such short sequences.
Wang Jun
In this regard, they used their in-depth understanding and mastery of the computer field to establish a predictive model based on artificial intelligence, and transformed many methods of natural language analysis into genome mining.
After a period of training, the accuracy of the model had reached a more reliable value, and then the team validated the ten short peptides of antimicrobial peptides with the predictions in the eukaryotic data, and found that 8 of them had active.
Then, they began to use the large amount of metagenomic data that has been made public to conduct peptide mining and logical derivation, and integrate more information to achieve more effective mining.
Finally, the team began to study the mechanism, safety and animal experiments of synthetic peptides, and concluded that peptides without obvious toxicity to eukaryotic cells can reduce the load of infectious bacteria in animals and effectively treat Klebsiella pneumoniae resulting infection.
Wang Jun said that he would also like to thank Chen Yihua’s research group of the Institute of Microbiology, Chinese Academy of Sciences for their strong support for this research. It is understood that the two research groups worked together to analyze the structure and mechanism of multiple potential peptides, and confirmed that these peptides have high diversity in structure and mechanism.
The study shows that their method not only enables the discovery of relatively new peptides, but also has no specific preferences or limitations in terms of mechanism and structure.
It is worth mentioning that the application prospect of this research is extremely broad. On the one hand, it expands the transformation and export of microbiome and other genomic data, and directly presents many macromolecules encoded in it to researchers, which is conducive to the mining of peptides and RNA drugs; on the other hand, accompanied by sequencing methods. Innovative and rapidly growing data may lead to the emergence of more peptides that can treat autoimmune diseases, metabolic diseases, and tumors.
In addition, on the basis of existing peptides, researchers can chemically modify them, which contributes to subsequent stability, prolongation of half-life, and improvement of safety, which is also an indispensable step before entering the clinic.
Wang Jun said, “The peptides we discovered are expected to quickly enter clinical use to help solve the problem of drug-resistant bacterial infections and more major non-infectious chronic diseases.”
At present, Wang Jun mainly conducts in-depth mining of biological data and analytical work. He uses a combination of statistics and bioinformatics to analyze the role of the gut microbiota on genomes and disease in humans and animals.
Up to now, he has published more than 60 SCI papers, undertaken 5 major fund projects, and applied for 5 patents.
Regarding this research, Wang Jun said that in the future, they will continue to expand the application scope of the excavated macromolecules, and gradually expand the microbial functional macromolecules from anti-infection to the treatment of metabolic diseases and immune diseases. He said, “We also plan to carry out preclinical optimization of the current peptides, gradually improve the scope of druggability
and antibacterial, and further optimize them to the treatment of Gram-positive bacteria and fungi.”
With progress and accumulation of previous knowledge, the team may be able to design a series of macromolecules from scratch that do not exist in nature today.