AlphaFold2, a Nobel Prize-winning protein structure predictor, first unveiled in November 2020, culminated years of work and previous iterations to use AI to predict protein structure.
This innovation was important to life science research as it has predicted 200 million protein structures, according to the Nobel Foundation. These proteins can be used for understanding diseases, discovering drugs, engineering new proteins and studying protein-protein interactions, according to Dr. Ann Cheung in a video posted by the New England Journal of Medicine.
AlphaFold works primarily by using a neural network called the Evoformer. The Evoformer takes in two pieces of information to feed to the Evoformer that predicts the structure of the protein of interest.
The pieces of information are the multiple sequence alignment and the pair representation.
The multiple sequence alignment takes protein sequences from similar proteins to compare them to the protein in question. This can tell which amino acids are used in all proteins and also what the relation is between amino acids.
The pair representation compares the relation of the amino acids. It does this while also taking into account the triangle inequality, which states that one side can’t be greater than the sum of the other two sides. This is really just saying that when it can’t have a triangle (three amino acids together) with a very large side length.
It runs these through the Evoformer, which is comprised of 48 blocks/steps. Throughout this, they can communicate with one another.
The Evoformer also uses attention, which is looking at things contextually. An example is if someone said the letter T, it would make no sense, but if they added Boston you might be able to recognize that that’s the public transit. This can aid in pattern recognition and look at the whole picture.
This data is put into a graph-like structure which is fed to the structure model which has 8 blocks/steps. This module looks at the chemical and physical constraints as well as the previous information to make the atoms triangles and figure out their position relative to one another.
It then repeats these steps three more times to obtain the final structure.
This produces a highly accurate structure of proteins. These are guesses, however, and experimentally obtained data is more accurate according to the European Bioinformatics Institute. It also has trouble predicting regions with little protein structure.
AlphaFold2 can impact daily life in many ways. AlphaFold2 can be used for drug discovery, protein engineering and understanding of diseases. This can lead to treatment for diseases as well as advancing understanding for further treatment research.
In November of 2024, AlphaFold3 was released. This took out most of the multiple sequence alignment and mushed it into the pair representation. It also has the ability to predict DNA, RNA, and signaling protein structures.