Leveraging artificial intelligence and deep learning to generate proteins de novo (a.k.a ‘synthetic proteins’) has unlocked new frontiers of protein design. Despite being trained on biological sequences and experimental structures, these deep learning models can sample new sequences and structural landscapes unexplored by evolution. This approach can be used to create new protein-protein interactions and bespoke binders that target specific proteins and domains. The current published state of the art, RFdiffusion, has successfully generated novel protein binders. However, it faces practical limitations in computational scalability and experimental success rates — limitations that may stem from its reliance on structural data alone for training. Reinforcement learning — which excels at learning to navigate the complex and sparse action space of protein design may improve RFdiffusion’s robustness for protein design.
First, we present an implementation of a synthetic protein binder design workflow that is deployable on HPC systems using the Nextflow orchestration domain-specific language and Singularity containerisation, which automatically streams and parallelises the workload across available CPUs and GPUs. Second, we present the ProteinDesignGym — a novel reinforcement learning environment for improving binding affinity, which will be trained on deep mutational scanning data. The ProteinDesignGym will be used to facilitate the implementation of DeltaDiffusion, a novel tool that will aim to improve protein binder design by iteratively refining RFdiffusion's putative protein designs using reinforcement learning. Together, these tools represent an innovative way of tackling de novo protein binder design.