3 datasets found

Filter Results
  • Gene Ontology dataset

    The Gene Ontology dataset contains protein functions in the form of Gene Ontology terms.
  • UniProt dataset

    The UniProt dataset is a comprehensive protein dataset. We download reviewed protein sequences (550k) with the limitation of 100 in length as D_r (57k examples). Then we use a...
  • ProtDescribe

    The ProtDescribe dataset used for pretraining the AMMA model, consisting of 553k sequence and function description pairs.