-
Gene Ontology dataset
The Gene Ontology dataset contains protein functions in the form of Gene Ontology terms. -
UniProt dataset
The UniProt dataset is a comprehensive protein dataset. We download reviewed protein sequences (550k) with the limitation of 100 in length as D_r (57k examples). Then we use a... -
ProtDescribe
The ProtDescribe dataset used for pretraining the AMMA model, consisting of 553k sequence and function description pairs.