RAPPER (CCP4: Supported Program)

NAME

RAPPER – conformer modelling through sampling residue specific phi / psi propensity tables and rotomeric states given a set of restraints.

USAGE | Doc for RAPPER in CCP4i | INPUT / OUTPUT FILE | KEYWORDS | EXAMPLES | TIPS | REFERENCES

SYNOPSIS

rapper params.xml Mode Keyword_input

DESCRIPTION

Rapper is a program which generates protein conformers by sampling residue specific phi / psi propensity tables and rotomer libraries using a set of constraints (i.e. ideal bond angles and lengths) and given restraints. Restraints include positional restraints including those based on atom positions, framework restraints of known structure, secondary structure and experimental data (specifically electron density from X-ray crystallography) [References 1-9]. Please note this program was developed outside of the CCP4 framework and thus has various non CCP4 standard features.

USAGE ( top )

The generalised nature of the sampling and restraint generation has allowed RAPPER to be applied to a number of conformer modelling problems. These include

  1. ab initio loop modelling

  2. C-alpha tracing

  3. Conformer fitting to electron density (both high and low resolution)

  4. Comparative modelling

As well as this there are a number of utility modes. These include joining PDB files together, RMSD calculations, structure superimposition, etc. Documentation on these modes can be found on the main RAPPER website (http://mordred.bioc.cam.ac.uk/~rapper). The documentation below refers to the modes of operation as implemented in the CCP4i interface.

Documentation for RAPPER in CCP4i ( top )

The modes of operation implemented in CCP4i are:

  1. loop modelling – This is to generate a short section of structure that is currently not modelled. This can be done both with and without using electron density as a restraint.

  2. loop modelling from PDB – This is to re-generate a short section of structure that is currently modelled and present in the input PDB file. This DOES NOT use any positional information from the current section to be re-modelled but just takes the sequence information from the file. This can be done both with and without using electron density as a restraint.

  3. Ca-trace – This generates the entire structure using the C-alpha positions and points to sample around. The sequence is taken from that defined in the input PDB file. Either a short section or the entire structure can be generated. This can be done both with and without using electron density as a restraint.

  4. Rebuild bad fitting residues – This assesses the quality of fit of the current model to a map and identifies regions that should be rebuilt. These regions are rebuilt using the sequence as given in the input PDB file and the input map. The map can be any type (Fo-Fc, 2Fo-Fc, etc.), though a Sigma-A weighted OMIT map (as can be generated by CNS) has been shown to work well.

If building a small section(s) the fragment(s) generated can be integrated back into the rest of model by RAPPER using

      rapper params.xml joinpdb --pdb1 fragment.pdb --pdb2 framework.pdb --pdb-out out.pdb

INPUT AND OUTPUT FILES ( top )

Note these are given as keyword inputs.

Input

--pdb [synonym –pdb1] <value FILE>

Input PDB file using standard PDB file format. If model is fragmented and using C-alpha trace then each fragment should be defined with a separate chain ID as otherwise RAPPER will try to join the fragments as it does not consider residue numbering.

--map <value FILE>

This can be any type of map, though Sigma-A weighted OMIT maps have been shown to work well. Both CCP4 and CNS maps are supported.

Output

--pdb-out <value FILE>

Output PDB file and only used for joining modelled fragments back to the rest of the structure. By default a number of files are generated:

  1. loop.pdb / trace.pdb / looptest-best.pdb - depending on the mode one of these files will be produced which contains the first model generated.

  2. multiloop.pdb / multitrace.pdb / looptest.pdb - depending on the mode one of these files will be produced which contains all the models requested.

  3. native.pdb - the input PDB file.

  4. framework.pdb - the input PDB file with the section(s) being rebuilt removed.

  5. models.dat - some statistical information about the models compared to the input structure (if present).

  6. benchmark.dat - some statistical information about the models compared to the input structure (if present) and runtime information.

  7. run-parameters.xml - an XML formatted file of all the parameters and the values used for modelling (extensive).

      These files are generated automatically in a new subdirectory of the current working directory called TESTRUNS. If you want to direct the output to a specific folder then use the --runs-dir keyword (see below). If you want to tag the beginning of each file with a specific name then use the --use-CCP4i-file-name and -ccp4i-file-name keywords (see below).

--runs-dir <value FILE>

The directory to place the output files.

--use-CCP4i-file-name <value BOOLEAN><default false>

Use a specific file tag to prepend to the output files.

--ccp4i-file-name <value STRING>

The tag name that will be prepended to the output files. To be used with --use-CCP4i-file-name keyword

KEYWORDS (top )

RAPPER has a large number of keyword controls. Just those essential for running the modes of operation available in the CCP4i interface are given below. A full list of keywords can be found by calling:

      rapper params.xml help

or by scrutinising the params.xml file. Note that logical values are given as 'true' or 'false'. Often restraints can be switched on using a logical control which then will take default values. If default values wish to be altered both the restraint has to be switched on AND the value altered; just altering the values will not automatically turn on the restraint.

All input values are checked for validity of type and quality. Also a spell check is conducted on keyword commands and will suggested the nearest by similarity command if the spelling does not match any in the keyword database. All keywords are denoted by a double dash '--'.

--start <value INT>

Residue number to start building from.

--stop <value INT>

Residue number to stop building.

--chain-id <value CHAR>

Chain ID of section to be built. If all chains to be built use '*'.

--models <value INT><default 1>

Number of models to be built.

--cryst-d-high <value FLOAT>

Resolution of map in Angstroms. To be used in conjunction with map and edm-fit.

--edm-fit <value BOOLEAN><default false>

Use electron density map as a restraint.

--enforce-mainchain-restraints <value BOOLEAN><default false>

Use C-alpha atoms as positional restraints.

--mainchain-restraint-threshold <value FLOAT><default 2.0>

Size of restraint sphere to be sampled around the C-alpha atom position.

--sidechain-mode <value STRING><default none>

Required to build side chains. If side chains are required to be built use 'smart'.

--sidechain-radius-reduction <value FLOAT><default 1.00>

The factor by which we reduce the radii of hard-sphere excluded volume interactions when at least one atom is from a side chain. That is, if this parameter is 0.5, then a side chain atom can approach up to twice as close to any other atom as normal. An appropriate value to use is 0.75.

--enforce-sidechain-centroid-restraints <value BOOLEAN><default false>

Use virtual side chain centroid to be sampled around.

--sidechain-centroid-restraint-threshold <value FLOAT><default 2.0>

Size of restraint sphere to be sampled around the virtual side chain centroid position.

--sidechain-library <value FILE><default RAPPER-DIR/data/richardson.lib>

The rotomer library to be used. A number of files are distributed with RAPPER in the data directory. If the data directory is not in the RAPPER-DIR default path then the RAPPER-DIR should be set to point to the new location.

--rapper-dir <value FILE><default ..>

The location of the RAPPER root installation.

--edm-rebuild-poor-regions <value BOOLEAN><default false>

To rebuild regions of poor fit to an electron density map. Residues to be rebuilt are identified using a real space scoring function, the cut off for which is set using --edm-poor-region-threshold.

--edm-poor-region-threshold <value FLOAT><default 2.00>

Regions with fits worse than this number of standard deviations are considered 'poor'. Typical value is 0.80.

--edm-poor-region-buffer-size <value INT><default 1>

If a region fits poorly, the entire region plus this number of residues on either side are flagged for rebuilding.

--divide-and-conquer <value BOOLEAN><default true>

Divide the sequence randomly into fragments to take each fragment randomly to build. Allows for optimised time spent sampling regions with rare phi / psi states.

--divide-and-conquer-ignore-chain-breaks <value BOOLEAN><default false>

Allow bands to cross chain breaks.

--default-mainchain-b-factor <value FLOAT><default 20.00>

The b-factor assigned to the newly built main chain region.

--default-sidechain-b-factor <value FLOAT><default 30.00>

The b-factor assigned to the newly built side chain region.

--models-get-native-bfactors <value BOOLEAN><default true>

Take the b-factors from the section that is being rebuilt.

--error-for-no-models <value BOOLEAN><default true>

If true, then if no models can be found during conformational search, an error will be signalled. RAPPER doesn't really care if it can't find anything, and if false, then empty PDB files will be generated.

--fix-mislabeled-atoms <value BOOLEAN><default true>

We attempt to fix mislabelled atoms when reading PDB files.

--write-user-remarks <value BOOLEAN><default true>

Copy the remark lines from the input PDB file into the model output file.

--use-edm-filters <value BOOLEAN><default false>

Filter the models to provide an enriched solution set. This is computationally very expensive so only use if you have a relatively good map and some cpu time to spare.

--restraints-are-pass-optional <value BOOLEAN><default false>

Turn off restraints in the 11th plus pass. This is dangerous to use and should be used with care as it will produce models that will violate both restraints and constraints. In particular clash restraints are often violated leading to all sorts of weirdness, but will usually get a model to be generated.

--optional-edm-mainchain-restraints <value BOOLEAN><default false>

If true, then the 0.0 (positive density) mainchain restraint will be made optional. If false, then the main chain will be unconditionally forced to lie in positive density. This is primarily useful when tracing through a structure with regions in very poor (non-existent) density.

--enforce-mainchain-min-sigma-restraints <value BOOLEAN><default false>

If true, then electron density map restraints will be added if a map file is given.

--edm-mainchain-min-sigma <value FLOAT><default 0.5>

Only main chain atoms in a position with greater standard deviation than this are considered to satisfy the electron density map restraint.

--enforce-sidechain-min-sigma-restraints <value BOOLEAN><default false>

If true, then electron density map restraints will be added if a map file is given.

--edm-sidechain-min-sigma <value FLOAT><default 0.5>

Only side chain atoms in a position with greater standard deviation than this are considered to satisfy the electron density map restraint.

--strict-anchors <value BOOLEAN><default false>

If true, then RAPPER will be very particular about the quality of the N and C terminal anchor residues. This is primarily useful for loop modelling where the two anchors play a major role in model accuracy.

--enforce-strict-anchor-geometry <value BOOLEAN><default false>

If true, then restraints are added to ensure high-quality C terminal anchor geometry. This option, when enabled, can be expensive computationally, and may in fact cause the modelling to fail. On the other hand, models produced with this enabled will have better anchor geometries.

--use-contact-filters <value BOOLEAN><default false>

If true, then contact filters will be added.

EXAMPLES (top)

Below are examples for each of the four modes of modelling used in the CCP4i interface:

NOTE: From the command line the params.xml location needs to be given. In the CCP4 distribution this is located in ccp4-x-x.x/share/rapper/params.xml.

  1. Loop modelling with section not currently modelled in input PDB:

  1. Loop modelling with section already modelled in input PDB file using electron density:

  1. Tracing a complete model based on the c-alpha positions as restraints:

  1. Identify and rebuild residues that fit poorly into an electron density map:

TIPS (top)

  1. When building loops use the enforce-strict-anchor-geometry and use-contact-filters arguments as true.

  2. If you are unsuccessful at building the first time try moving the start and stop residues of the loop or part ca-trace.

  3. Make sure that the number of residues given in sequence matches the number of residues defined in difference between the start and stop points i.e. the number should equal the difference + 1 as the start residue is included in the loop.

  4. Note that the sequence is case sensitive and uses the single letter code.

  5. If building into density then the model should be refined before assessing it relative to a map.

  6. Loops generated may not 'look' perfect in a viewer but can be easily fixed using the tools in COOT.

REFERENCES (top)

  1. P.I.W. de Bakker, M.A. DePristo, D.F. Burke, T.L. Blundell (2002) Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the Generalized Born solvation model. Proteins Struct. Funct. Genet. 51 21-40.

  2. S.C. Lovell, I.W. Davis, W.B. Arendall III, P.I.W. de Bakker, J.M. Word, M.G. Prisant, J.S. Richardson, D.C. Richardson (2003) Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins: Struct. Funct. Genet. 50 437-450.

  3. M.A. DePristo, P.I.W. de Bakker, S.C. Lovell, T.L. Blundell (2002) Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles. Proteins Struct. Funct. Genet. 51 41-55.

  4. M.A. DePristo, P.I.W. de Bakker, R.P. Shetty, T.L. Blundell (2003) Discrete restraint-based protein modeling and the Cα-trace problem. Protein Science 12 2032-2046.

  5. R.P. Shetty, P.I.W. de Bakker, M.A. DePristo, T.L. Blundell (2003) The advantages of fine-grained side chain conformer libraries. Protein Engineering 16 963-969.

  6. M.A. DePristo, P.I.W. de Bakker, T.L. Blundell (2004) Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography. Structure (Camb.) 12 831-838.

  7. M.A. Depristo, P.I.W. de Bakker, R.J. Johnson, T. L. Blundell. (2005) Crystallographic refinement by knowledge-based exploration of complex energy landscapes. Structure 13 (9) 1311-1319.

  8. N. Furnham, T. L. Blundell, M.A. Depristo, T. C. Terwilliger. (2006) Is one Solution Good Enough? Nature Structural & Molecular Biology 13 (3) 184-185.

  9. N. Furnham, Andrew S. Dore, Dimitri Y. Chirgadze, Paul I. W. de Bakker, M.A. Depristo, T. L. Blundell. (2006) Knowledge-Based Real-Space Explorations for Low-Resolution Structure Determination Structure 14 (8) 1313-1320.

AUTHORS

Nicholas Furnham, Paul de Bakker, Mark DePristo, Reshma Shetty, Swanand Gore and Tom Blundell.

SEE ALSO

RAMPAGE