Poster Presentation 50th Lorne Proteins Conference 2025

Exploring the Dark Spots in Protein Data Bank (#313)

Junjie Xu 1 2 , Ashar Malik 1 2 , David Ascher 1 2
  1. School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
  2. Baker Heart and Diabetes Institute, Melbourne, VIC, Australia

Understanding biochemical functions and interactions relies on the comprehensive knowledge of protein structures. Various experimental techniques are available to determine the three-dimensional structure of proteins, with X-ray crystallography being the most frequently employed method. Despite its widespread use, X-ray crystallography poses challenges in resolving flexible regions and frequently yields incomplete structures. This leads to the occurrence of missing regions, "Dark Spots", in PDB entries. This presents a significant obstacle for the field of structural biology. In recent years, several computational methods utilising artificial intelligence have been developed to predict protein structures. While these prediction tools can generate high-quality ab initio models of protein structures, comparison of predicted structures with those experimentally determined reveals discrepancies both in backbone and side-chain positions. Employing homology modelling tools MODELLER and Rosetta, this study resolves ‚"Dark Spots", in protein structures by generating structural fragments that bridge gaps while preserving high-quality, experimentally determined regions. Beyond creating a comprehensive database of complete protein structures, the investigation also explores how these completions influence the internal characteristics of proteins and potentially affect their interactions with ligands. PDB-NaQ is an open-source digital repository offering multiple reference conformations for each structure, along with detailed structural analysis metrics. This provides a more extensive collection of reference conformations for in-depth investigations of protein structure and function and offers a more convenient option for downstream analyses that depend on intact protein structures.