-
ProtDescribe
The ProtDescribe dataset used for pretraining the AMMA model, consisting of 553k sequence and function description pairs. -
AlphaFoldDB
The dataset used in the paper for secondary structure-guided novel protein sequence generation with latent graph diffusion.