|
The SCRATCH servers: quick help and references The servers: quick descriptionSSpro v 2.0
SSpro is a server for protein secondary structure prediction based on an ensemble of
11 BRNNs (bidirectional recurrent neural networks). For a detailed explanation of the
methods see in the references. SSpro8SSpro8 is an experimental extension to SSpro. Instead of using three classes (helix, strand and the rest) to assign the secondary structure of a protein, SSpro8 adopts the full DSSP 8-class output classification:
For a detailed description of the tests performed on SSpro8, see
the references. The overall performance
(Q8) of the system currently online (based on PSI-BLAST profiles) is approximately 63%. CONproCONpro is a server that predicts whether the number of contacts of each residue in a protein
is above or below the average for that residue. The prediction of CONpro is based on BRNNs,
adopting as input a multiple alignment of homologues generated by PSI-BLAST. ACCproACCpro is a server for the prediction of the relative solvent accessibility of protein residues.
The prediction of ACCpro is based on BRNNs, adopting as input a multiple alignment of homologues
generated by PSI-BLAST. CMAPproCMAPpro is a server for the prediction of maps of contacts between protein residues.
The prediction of CMAPpro is based on ensembles of Generalised Recurrent Neural Networks for the
translation of matrices. The input of the system consists of two-dimensional profiles
extracted from multiple alignments of homologues generated by PSI-BLAST, and of
secondary structure and solvent accessibility predictions obtained respectively
from SSpro and ACCpro. CCMAPproCCMAPpro is a server for the prediction of maps of contacts between regular secondary
structure elements (Helices, Strands).
The prediction of CCMAPpro is based on ensembles of Generalised Recurrent Neural Networks for the
translation of sequences into matrices. The input of the system consists of profiles
extracted from multiple alignments of homologues generated by PSI-BLAST, and of
secondary structure and solvent accessibility predictions obtained respectively
from SSpro and ACCpro. Location and length of
secondary structure elements are extracted from SSpro predictions.
Segments shorter than 2 residues are not considered. CMAP23DproCMAPpro is an experimental server for the reconstruction of protein backbone coordinates
from predicted contact maps obtained from CMAPpro.
The reconstruction is based on a stochastic search in the space of configurations, driven
by a potential based on the predicted contact maps, and on soft geometrical constraints
(C-α distances, helical chirality, hard-core repulsion between amino acids).
We plan to release a paper on CMAP23Dpro shortly.
Input formats
Your email address, the place where the prediction will be delivered. Query nameAn optional name for your query. We strongly suggest that you use one,
especially if sending more than one query. The order in which you send your queries
may not correspond to the order in which you receive the answers.
PredictionsThe predictions you want to receive. SSpro 2 (secondary structure),
SSpro8 (8-class secondary structure), CONpro (number of residue contacts),
ACCpro (relative solvent accessibility), CMAPpro (residue contact maps, at
8, 10 and 12 Å), CCMAPpro (contact maps between secondary structure elements at 12 Å),
CMAP23Dpro (backbone 3D coordinates reconstructed from contact maps).
In case ACCpro is selected, the threshold of solvent accessibility
used to define a residue as buried or exposed can be chosen. Available options are: any single
threshold between 0% and 95% at 5% steps, all the 19 thresholds (19 prediction lines on output),
or 10%, 20%, 25%, 30% and 40% thresholds (5 prediction lines on output, default).
For CMAPpro, the 8, 10 and 12 Å maps can be selected independently. Input sequenceThe sequence of aminoacids:
Output formatReplies are sent by email. SSpro, SSpro8,
ACCpro and CONpro replies come as text,
embedded in the body of the email. KKGHQDFVWVLSRSKVLTGEAKTAVENYLIGSPVVDSQKLVYSDFSEAACKVN CCCCCCEEEEEECCCCCCHHHHHHHHHHHHHCCCCCHHHHEECCCCHHHHCCC TCCCCEEEEEEESCTTCCHHHHHHHHHHHHTCTTCCHHHEEECCHHHHHHHCC -----+++++++--------+++++++++--+-------------+++++--- ---++++++++++-------++--++++++++---+---------++-++--- ---+++++++++++------++--++--+++++--++------++++-+---- ---++++++++++-------++--++--+++++--+-------++++-+---- eee----------eee-eee-eee-ee--e--ee-e-ee-eeee-eee--eee The 8 lines have the following meaning:
CMAPpro predictions come as attached raw files, with extension conpro08,
conpro10 and conpro12, for thesholds 8, 10 and 12 Å respectively. If the query is N amino acids
long the files are composed of N lines, each containing N space-separated real numbers.
The j-th number on line i-th represents the estimated probability that amino acids i and j are
in contact (i.e. of their C-αs being closer than the threshold). CCMAPpro predictions come as attached raw files,
with extension ccmap. If SSpro predicts the presence of M regular (Helix or Strand)
secondary structure elements of length at least 2 residues,
the file is composed of M lines, each containing M space-separated real numbers.
The j-th number on line i-th represents the estimated probability that structures i and j are
in contact (i.e. of their centres being closer than 12 Å). CMAP23Dpro predictions come as an attached pdb file containing
xyz coordinates of the protein C-αs. ReferencesFor a general overview see: J. Cheng, A. Randall, M. Sweredoski, P. Baldi, "SCRATCH: a Protein
Structure and Structural Feature Prediction Server", Nucleic Acids
Research, Web Server Issue, vol. 33, w72-76, 2005 P. Baldi and G. Pollastri, "The Principled Design of Large-Scale Recursive
Neural Network Architectures-DAG-RNNs and the Protein Structure Prediction
Problem", Journal of Machine Learning Research, 4, 575-603, (2003). P.Baldi, G.Pollastri, "Machine Learning Structural and Functional Proteomics",
IEEE Intelligent Systems (Intelligent Systems in Biology II), March/April 2002.
For an explanation of the methods used in SSpro and SSpro8 see:
A more detailed description of BRNNs (bidirectional recurrent neural networks) can be found here:
For an explanation of the methods used in ACCpro and CONpro see:
G.Pollastri, P.Baldi, P.Fariselli, R.Casadio, "Prediction of Coordination Number and
Relative Solvent Accessibility in Proteins", Proteins, 47, 142-153, 2002.
For an explanation of the methods used in CMappro and CMap23Dpro see:
|
|
Back to SCRATCH Jianlin Cheng, jianlinc@ics.uci.edu, Pierre Baldi, pfbaldi@ics.uci.edu Institute for Genomics and Bioinformatics University of California Irvine |