AI Model Generates Novel Protein Structures and Sequences
A novel multimodal generative model, trained on vast sequence datasets, simultaneously generates protein sequences and their corresponding 3D structures. This model leverages a diffusion process within the latent space of protein folding models, achieving enhanced diversity in its generated samples. By learning from sequence data alone, it bypasses the limitations of relying solely on structural databases.
Repurposing Protein Folding Models for Generation with Latent Diffusion
PLAID, a multimodal generative model, simultaneously generates protein 1D sequences and 3D structures by learning the latent space of protein folding models. This enables compositional function and organism prompts, accessing databases 2-4 orders of magnitude larger than structure databases. PLAID addresses the multimodal co-generation problem, generating both discrete sequence and continuous all-atom structural coordinates. The model only requires sequences to train the generative model by learning a diffusion model over the latent space of a protein folding model, such as ESMFold. CHEAP, compresses the joint embedding of protein sequence and structure. PLAID samples demonstrate better diversity.