The SCRATCH servers:
quick help and references











The servers: quick description

SSpro v 2.0

SSpro is a server for protein secondary structure prediction based on an ensemble of 11 BRNNs (bidirectional recurrent neural networks). For a detailed explanation of the methods see in the references.
SSpro version 1 was online on 3/13/2000. In one year it handled more than 10,000 queries from 60 domains, at least 50 countries all over the world. From the very beginning SSpro 1.0 was tested by the independent assessor EVA, and showed a performance constantly exceeding 76% correctly classified residues on structures with no homologues in PDB, thus ranking always in first position among the servers tested.
In SSpro version 2.0 a better algorithm to obtain multiple alignments of homologue sequences, based on PSI-BLAST instead of BLAST, is exploited. Experiments on an independent test set show a performance exceeding 78% correctly classified residues on the CASP-like assignment of the secondary structure into 3 classes. More lenient assignments lead to 80% or better.

SSpro8

SSpro8 is an experimental extension to SSpro. Instead of using three classes (helix, strand and the rest) to assign the secondary structure of a protein, SSpro8 adopts the full DSSP 8-class output classification:

  1. H: alpha-helix
  2. G: 310-helix
  3. I: pi-helix (extremely rare)
  4. E: extended strand
  5. B: beta-bridge
  6. T: turn
  7. S: bend
  8. C: the rest

For a detailed description of the tests performed on SSpro8, see the references. The overall performance (Q8) of the system currently online (based on PSI-BLAST profiles) is approximately 63%.
NOTE: SSpro8 is a completely different system from SSpro. Their results may not match.

CONpro

CONpro is a server that predicts whether the number of contacts of each residue in a protein is above or below the average for that residue. The prediction of CONpro is based on BRNNs, adopting as input a multiple alignment of homologues generated by PSI-BLAST.
Four different radiuses have been considered to define a contact (6, 8, 10 and 12 Å) thus leading to 4 different systems with performances ranging between 71% and 73%, significantly better that any other system previously described. The complete system is composed by 84 BRNNs.
For a more detailed explanation, see in the references.

ACCpro

ACCpro is a server for the prediction of the relative solvent accessibility of protein residues. The prediction of ACCpro is based on BRNNs, adopting as input a multiple alignment of homologues generated by PSI-BLAST.
Each residue in a protein is predicted as buried or exposed, i.e. less or more accessible than a specified threshold. All thresholds between 0% and 95% at steps of 5% are available. For a 25% threshold, the 'hard' case corresponding to practically identical numbers of buried and exposed residues, ACCpro classifies correctly 77.2% of the residues, better than any other system previously described.
For a more detailed explanation, see in the references.

CMAPpro

CMAPpro is a server for the prediction of maps of contacts between protein residues. The prediction of CMAPpro is based on ensembles of Generalised Recurrent Neural Networks for the translation of matrices. The input of the system consists of two-dimensional profiles extracted from multiple alignments of homologues generated by PSI-BLAST, and of secondary structure and solvent accessibility predictions obtained respectively from SSpro and ACCpro.
Maps at 8, 10 and 12 Å are available, meaning that two amino acids are defined as being in contact if their C-α are closer than 8, 10 and 12 Å respectively. For a description of the methods and of the tests performed on CMAPpro, see the references.

CCMAPpro

CCMAPpro is a server for the prediction of maps of contacts between regular secondary structure elements (Helices, Strands). The prediction of CCMAPpro is based on ensembles of Generalised Recurrent Neural Networks for the translation of sequences into matrices. The input of the system consists of profiles extracted from multiple alignments of homologues generated by PSI-BLAST, and of secondary structure and solvent accessibility predictions obtained respectively from SSpro and ACCpro. Location and length of secondary structure elements are extracted from SSpro predictions. Segments shorter than 2 residues are not considered.
Contact threshold is 12 Å, meaning that two structures are defined as being in contact if their centres (averages of C-α positions) are closer than 12 Å.
We plan to release shortly a paper describing in detail methods and performances of CCMAPpro.

CMAP23Dpro

CMAPpro is an experimental server for the reconstruction of protein backbone coordinates from predicted contact maps obtained from CMAPpro. The reconstruction is based on a stochastic search in the space of configurations, driven by a potential based on the predicted contact maps, and on soft geometrical constraints (C-α distances, helical chirality, hard-core repulsion between amino acids). We plan to release a paper on CMAP23Dpro shortly.

Input formats



Email

Your email address, the place where the prediction will be delivered.
NOTE: Check that you typed your address correctly. Approximately 5% of the queries handled by SSpro 1.0 didn't receive an answer because of incorrect typing.

Query name

An optional name for your query. We strongly suggest that you use one, especially if sending more than one query. The order in which you send your queries may not correspond to the order in which you receive the answers.

Predictions

The predictions you want to receive. SSpro 2 (secondary structure), SSpro8 (8-class secondary structure), CONpro (number of residue contacts), ACCpro (relative solvent accessibility), CMAPpro (residue contact maps, at 8, 10 and 12 Å), CCMAPpro (contact maps between secondary structure elements at 12 Å), CMAP23Dpro (backbone 3D coordinates reconstructed from contact maps). In case ACCpro is selected, the threshold of solvent accessibility used to define a residue as buried or exposed can be chosen. Available options are: any single threshold between 0% and 95% at 5% steps, all the 19 thresholds (19 prediction lines on output), or 10%, 20%, 25%, 30% and 40% thresholds (5 prediction lines on output, default). For CMAPpro, the 8, 10 and 12 Å maps can be selected independently.
Note: Due to intense computational requirements, only proteins of length at most 200 amino acids will be processed by CMAPpro, and of at most 100 amino acids by CMAP23Dpro.

Input sequence

The sequence of aminoacids:

  • A bare sequence is accepted. Please no FASTA format.
  • Spaces, newlines and tabs will be ignored, so feel free to have them in your query.
  • Letters not corresponding to any aminoacid will be treated as X.
  • Non alphabetical chars will cause the rejection of the query.
  • Only 1 letter amino acid code accepted. Please do not send nucleotide sequences. If so, A will be treated as Alanine, C as Cysteine, etc...


Output format

Replies are sent by email. SSpro, SSpro8, ACCpro and CONpro replies come as text, embedded in the body of the email.
Here you have an example of prediction:

KKGHQDFVWVLSRSKVLTGEAKTAVENYLIGSPVVDSQKLVYSDFSEAACKVN
CCCCCCEEEEEECCCCCCHHHHHHHHHHHHHCCCCCHHHHEECCCCHHHHCCC
TCCCCEEEEEEESCTTCCHHHHHHHHHHHHTCTTCCHHHEEECCHHHHHHHCC
-----+++++++--------+++++++++--+-------------+++++---
---++++++++++-------++--++++++++---+---------++-++---
---+++++++++++------++--++--+++++--++------++++-+----
---++++++++++-------++--++--+++++--+-------++++-+----
eee----------eee-eee-eee-ee--e--ee-e-ee-eeee-eee--eee

The 8 lines have the following meaning:

  • Line 1: The 1-letter code of your protein primary sequence. This line is always present.
  • Line 2: Secondary structure prediction:
    • H = helix
    • E = strand
    • C = the rest
    This line is present if you requested an SSpro prediction.
  • Line 3: 8-class secondary structure prediction:
    • H: alpha-helix
    • G: 310-helix
    • I: pi-helix (extremely rare)
    • E: extended strand
    • B: beta-bridge
    • T: turn
    • S: bend
    • C: the rest
    This line is present if you requested an SSpro8 prediction.
  • Lines 4-7: Predictions of number of residue contacts at 6, 8, 10 and 12 Å:
    • + : the residue has more contacts than its average
    • - : the residue has fewer contacts than its average
    The 4 lines are present if you requested a CONpro prediction.
  • Line 8: Prediction of relative solvent accessibility:
    • - : the residue is buried
    • e : the residue is exposed
    The threshold of solvent accessibility used to discriminate the buried/exposed state is reported right after the prediction.
    This line (or lines, in case multiple thresholds are selected) is present if you requested an ACCpro prediction and the buried/exposed threshold is the one you selected, or the default 10%, 20%, 25%, 30% and 40%.


CMAPpro predictions come as attached raw files, with extension conpro08, conpro10 and conpro12, for thesholds 8, 10 and 12 Å respectively. If the query is N amino acids long the files are composed of N lines, each containing N space-separated real numbers. The j-th number on line i-th represents the estimated probability that amino acids i and j are in contact (i.e. of their C-αs being closer than the threshold).
Note: Since CMAPpro predictions are computationally intensive only proteins of length at most 200 amino acids will be accepted if the CMAPpro predictions are selected.

CCMAPpro predictions come as attached raw files, with extension ccmap. If SSpro predicts the presence of M regular (Helix or Strand) secondary structure elements of length at least 2 residues, the file is composed of M lines, each containing M space-separated real numbers. The j-th number on line i-th represents the estimated probability that structures i and j are in contact (i.e. of their centres being closer than 12 Å).

CMAP23Dpro predictions come as an attached pdb file containing xyz coordinates of the protein C-αs.
Note: Since CMAP23Dpro predictions (and CMAPpro contact map predictions on which CMAP23Dpro relies) are computationally intensive only proteins of length at most 100 amino acids will be accepted if CMAP23Dpro prediction is selected.

References

For a general overview see:

J. Cheng, A. Randall, M. Sweredoski, P. Baldi, "SCRATCH: a Protein Structure and Structural Feature Prediction Server", Nucleic Acids Research, Web Server Issue, vol. 33, w72-76, 2005
Download PDFHTML (Nucleic Acids Research web site)

P. Baldi and G. Pollastri, "The Principled Design of Large-Scale Recursive Neural Network Architectures-DAG-RNNs and the Protein Structure Prediction Problem", Journal of Machine Learning Research, 4, 575-603, (2003).
Download PDF.

P.Baldi, G.Pollastri, "Machine Learning Structural and Functional Proteomics", IEEE Intelligent Systems (Intelligent Systems in Biology II), March/April 2002.
Download
PDF.

For an explanation of the methods used in SSpro and SSpro8 see:

G.Pollastri, D.Przybylski, B.Rost, P.Baldi, "Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles", Proteins, 47, 228-235, 2002.
Download PDF, Abstract and HTML (Proteins web site).

Or:
P.Baldi, S.Brunak, P.Frasconi, G.Pollastri, and G.Soda, "Exploiting the Past and the Future in Protein Secondary Structure Prediction", Bioinformatics, 15, 937-946, (1999).
Download PDF, HTML (Bioinformatics web site).

Or (quick abstract):
Pollastri,G.,Baldi,P., "SSpro, a web server for protein secondary structure prediction based on recurrent neural networks"
Proceedings of CASP2000, Asilomar, CA
HTML version, and gzipped postscript.

A more detailed description of BRNNs (bidirectional recurrent neural networks) can be found here:
Baldi,P., Brunak,S., Frasconi,P., Pollastri,G., and Soda,G., "Bidirectional Dynamics for Protein Secondary Structure Prediction", in Sequence Learning: Paradigms, Algorithms, and Applications, R. Sun and L. Giles Editors, Springer Verlag, (2000).
Download PDF, Abstract (Book web site)

For an explanation of the methods used in ACCpro and CONpro see:

P. Baldi and G. Pollastri. "The Principled Design of Large-Scale Recursive Neural Network Architectures—DAG-RNNs and the Protein Structure Prediction Problem", Journal of Machine Learning Research, 4, 575-602, 2003.
Download PDF, Abstract and HTML (JMLR web site)

G.Pollastri, P.Baldi, P.Fariselli, R.Casadio, "Prediction of Coordination Number and Relative Solvent Accessibility in Proteins", Proteins, 47, 142-153, 2002.
Download PDF, Abstract and HTML (Proteins web site)

Or:
Pollastri,G., Baldi,P., Fariselli,P., Casadio,R., "Improved Prediction of the Number of Residue Contacts in Proteins by Recurrent Neural Networks", Bioinformatics, 17 Suppl 1, S234-S242 (2001).
Download PDF, HTML abstract (Bioinformatics web site).

For an explanation of the methods used in CMappro and CMap23Dpro see:

G.Pollastri, P.Baldi, "Prediction of Contact Maps by Recurrent Neural Network Architectures and Hidden Context Propagation from All Four Cardinal Corners", Bioinformatics, 18 Suppl 1, S62-S70 (2002).
Download PDF, HTML abstract (Bioinformatics web site).

And:
P.Baldi, G.Pollastri, "Machine Learning Structural and Functional Proteomics", IEEE Intelligent Systems (Intelligent Systems in Biology II), March/April 2002.
Download PDF.






Back to SCRATCH
Jianlin Cheng, jianlinc@ics.uci.edu, Pierre Baldi, pfbaldi@ics.uci.edu
Institute for Genomics and Bioinformatics
University of California Irvine