Poster Presentation 50th Lorne Proteins Conference 2025

Haplotypes and Human Diversity in Proteomics (#307)

Jakub Vasicek 1 2 , Dafni Skiadopoulou 1 2 , Ksenia G Kuznetsova 1 2 , Pål R Njølstad 1 3 , Stefan Johansson 1 4 , Stefan Bruckner 5 , Lukas Käll 6 , Marc Vaudel 1 2 7
  1. Department of Clinical Science, University of Bergen, Bergen, Norway
  2. Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
  3. Children and Youth Clinic, Haukeland University Hospital, Bergen, Norway
  4. Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
  5. Institute for Visual and Analytic Computing, University of Rostock, Rostock, Germany
  6. Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm, Sweden
  7. Department of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, Oslo, Norway

Genomic research has long benefited from using diverse population panels, increasing the statistical power of association studies for participants from admixed populations [1–3]. However, mass spectrometry-based proteomic workflows often project all data to a set of reference protein sequences. Consequently, we obscure a portion of the proteome, restricting our ability to fully analyze complex samples [4–6]. Moreover, we risk introducing a bias against populations with a different haplotypic structure. 

Alleles co-occurring in the protein-coding regions of the same gene produce a unique protein sequence - protein haplotype [5]. These haplotypes are present in biological samples, and detectable by mass spectrometry. We have demonstrated that thousands of amino acid substitutions can be discovered in a single sample, sometimes featuring alleles in linkage disequilibrium within the same peptide after a tryptic digestion of the protein [6]. We have recently released ProHap, a bioinformatic pipeline that allows building proteomic databases from genetic reference panels [7]. 

We generated proteomic databases from the 1000 Genomes Project [1] and showed that participants of the African superpopulation diverge from the reference proteome more than others, while all the included ancestry groups show notable differences from the reference proteome [7]. ProHap alleviates this bias by creating databases that capture the diversity of human proteomes and allows the fair competition of protein haplotypes during proteomic searches. The pipeline can be run on public as well as local reference panels, with great flexibility in terms of types of genetic variants and haplotype frequency, empowering researchers to tailor their proteomic studies to populations.

To allow a rapid insight into the complexity of such proteogenomic datasets, we have developed a web-based visual interface mapping identified peptides to genes, haplotypes, and spliced transcripts. ProHap Explorer allows researchers to browse the influence of common haplotypes on any gene of interest, and view the coverage of the resulting proteoforms in public mass spectrometry data sets.

  1. Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, et al.. A global reference for human genetic variation. Nature. 2015; doi: 10.1038/nature15393.
  2. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al.. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016; doi: 10.1038/ng.3643.
  3. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019; doi: 10.1038/s41588-019-0379-x.
  4. Fujimoto GM, Monroe ME, Rodriguez L, Wu C, MacLean B, Smith RD, et al.. Accounting for Population Variation in Targeted Proteomics. J Proteome Res. 2014; doi: 10.1021/pr4011052.
  5. Spooner W, McLaren W, Slidel T, Finch DK, Butler R, Campbell J, et al.. Haplosaurus computes protein haplotypes for use in precision drug design. Nat Commun. 2018; doi: 10.1038/s41467-018-06542-1.
  6. Vašíček J, Skiadopoulou D, Kuznetsova KG, Wen B, Johansson S, Njølstad PR, et al.. Finding haplotypic signatures in proteins. GigaScience. 2023; doi: 10.1093/gigascience/giad093.
  7. Vašíček J, Kuznetsova KG, Skiadopoulou D, Njølstad PR, Johansson S, Bruckner S, et al.. ProHap enables proteomic database generation accounting for population diversity. bioRxiv. 2024; doi: 10.1101/2023.12.24.572591.