-
Physicochemical Properties of Protein Tertiary Structure Data Set
Physicochemical Properties of Protein Tertiary Structure Data Set. UCI Machine Learning Repository, https://doi.org/10.24432/C5QW3H. -
GOTaxon: Representing the evolution of biological functions in the Gene Ontology
The Gene Ontology aims to define the universe of functions known for gene products, at the molecular, cellular and organism levels. -
CASP dataset
The CASP dataset was used for testing. The dataset contains 96 template-free proteins and 90 template-based proteins. -
SCOP 2.06 dataset
The SCOP 2.06 dataset was used for testing. The dataset contains 4,188 domains, covering 550 folds. -
SCOP 1.75 dataset
The SCOP 1.75 dataset was used for training and validation. The dataset contains 16,712 proteins covering 7 major structural classes with total 1,195 identified folds. -
DeepSF: deep convolutional neural network for mapping protein sequences to folds
Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly... -
COBS: a Compact Bit-Sliced Signature Index
COBS is a compact bit-sliced signature index for approximate pattern matching on large q-gram datasets. -
scX: A user-friendly tool for scRNA-seq exploration
Single-cell RNA sequencing (scRNA-seq) has transformed our ability to explore biological systems. Nevertheless, proficient expertise is essential for handling and interpreting... -
MEGADOCK-GUI
MEGADOCK-GUI is a GUI-based complete cross-docking tool for exploring protein-protein interactions. It can automatically perform complete cross-docking of M vs. N proteins. -
Protein Folding
The dataset used in the paper for protein folding, which is a type of bioinformatics problem. -
TCR alpha chain rearrangement distribution
The dataset used in the paper is a collection of non-productive sequences of T cell receptor (TCR) genes, including alpha and beta chains. -
Conditional Random Fields
CRFs have been applied to a variety of domains, including text processing, computer vision, and bioinformatics. -
Table 5: Maximum accuracy rates of sequence-by-sequence recognition with SVM ...
The dataset contains 12390 marker gene database with different dimension (k-bp). -
Table 4: Maximum accuracy rates of dimension-by-dimension recognition with SV...
The dataset contains 12390 marker gene database with different dimension (k-bp). -
Table 3: Maximum accuracy rates of sequence-by-sequence recognition with SVM ...
The dataset contains 12390 marker gene database with different dimension (k-bp). -
Table 2: Maximum accuracy rates of dimension-by-dimension recognition with SV...
The dataset contains 12390 marker gene database with different dimension (k-bp). -
Random Fragments Classification of Microbial Marker Clades with Multi-class SV...
Microbial clades modeling is a challenging problem in biology based on microarray genome sequences, especially in new species gene isolates discovery and category. Marker family... -
Diabetes and Asia datasets
The Diabetes and Asia datasets were used for the experiments.