Artificial Intelligence Approach to Genetically Originated Diseases
20.04.2022 / Publications

In this study, Hacettepe University, Computer Engineering Department member Assoc. Prof. Dr. Tunca Doğan and her collaborators examined the use of deep learning-based language models, which led to significant developments in the field of natural language processing, within the framework of an approach in which the amino acid sequences of proteins are evaluated as sentences in a text, and the molecular functions of the same proteins are evaluated as the meanings of these sentences.

Complex functional information hidden in amino acid sequence patterns is learned automatically by large-scale and unsupervised (or self-supervised) deep learning models (Unsal et al., 2022).

The authors described their most important finding as "the ability of these deep protein language models, each of which can contain billions of parameters, to successfully learn complex biological mechanisms using protein sequence data alone." In addition, the authors reported that these results are consistent with the findings of methods such as Deepmind-AlphaFold2 and RoseTTAFold, which again use sequence data as input and generate 3-dimensional protein monomer structure predictions with extremely high performance. It was stated that in the future, within the scope of developing new personalized treatment options in the field of health, it is aimed to use such artificial intelligence models in the molecular analysis of genetic diseases and in the discovery of new drugs that will specifically target them.

Click the link for detailed information and the full online pdf version of the study:

Unsal, S., Atas, H., Albayrak, M., Turhan, K., Acar, A. C., & Doğan, T. (2022). Learning Functional Properties of Proteins with Language Models. Nature Machine Intelligence, 4, 227–245 [DOI:10.1038/s42256-022-00457-9].