-
UniprotKB/SwissProt
The UniprotKB/SwissProt database contains protein sequence information. -
UniProt dataset
The UniProt dataset is a comprehensive protein dataset. We download reviewed protein sequences (550k) with the limitation of 100 in length as D_r (57k examples). Then we use a... -
DeepSF dataset
The DeepSF dataset is a benchmark for protein sequence analysis. -
Pfam protein families database
The Pfam protein families database in 2019. The dataset is used for protein sequence analysis and contains 31 million protein domains. -
AMMA dataset
The dataset used in the paper for protein representation learning, consisting of 120k sequence, structure, and function triplets.