Poster Presentation 50th Lorne Proteins Conference 2025

Harnessing AlphaFold and explainable AI to better characterise human missense variants and diseases (#115)

Qisheng Pan 1 , Stephanie Portelli 1 , Thanh Binh Nguyen 1 , David Ascher 1
  1. University of Queensland and Baker Institute, Melbourne, VIC, Australia

The revolutionary achievement of AlphaFold has resulted in a wealth of protein three-dimensional structures. This method, however, struggles with modelling single amino acid substitutions, leaving challenges in understanding their molecular mechanisms in disease development. Here we present our pipeline using explainable Artificial Intelligence (AI) to study the human missense variants and their pathogenic phenotypes in a structural context. 

We first systematically evaluated various computational biophysical tools on AlphaFold2 and homology models. We found that the tools to predict the effect of mutation on protein stability maintained robust performance when the homology models were generated with sequence identity as low as 40%. Meanwhile, using AlphaFold2 models as inputs presented consistent performance with experiment-determined structural inputs. While AlphaFold2 could model structures of high quality with low RMSD (< 2 Å), using these models to study ligand interaction via docking resulted in a 10-20% performance deterioration comparable to using traditional homology modelling. 

Given these insights on performance variability, we then investigated how variants are associated with diseases, using AlphaFold2 models. We first observed that the pathogenic variants are enriched in the key, functional regions and usually cause unfavourable effects. For instance, the oncogenic variants in tumour suppressor p53 tend to destabilise both the monomeric and tetrameric structures. Further to that, we leveraged machine learning to identify disease-causing variants in cancer and Alzheimer’s Disease (AD), respectively. Both our models on p53 and AD-related proteins achieved state-of-the-art performance with Matthew’s Correlation Coefficient (MCC) above 0.79. More importantly, our methods allowed users to visualise the mutations in a structural environment with interactions and provided biological interpretation of the features, revealing the pathogenic mechanism at the molecular level. 

In summary, our work highlights how to leverage the power of Alphafold2 to characterise missense variants and their phenotypes, despite its poor performance in modelling mutants, which could serve as invaluable resources for personalised treatment.