Crank version 1.5.x Documentation

NAME

crank - automated structure solution pipeline for SAD/MAD or SIRAS data.

DESCRIPTION

Crank [1] is a program to automate macromolecular structure determination for single or multiple-wavelength anomalous diffraction (SAD/MAD) or single isomorphous replacement (SIRAS) experiments. Crank interfaces with various crystallographic programs and is designed to allow both the automation of the structure determination process, but also allow the user to re-run and optmize results, if necessary. Users can start either at the substructure detection or substructure phasing step and can end at any stage after the initial step.

This version of Crank has interfaces to the programs CRUNCH2 [2] and SHELXD [3] for substructure detection, BP3 [4], [5] for substructure phasing, SOLOMON [6], DM [7], SHELXE [8], PARROT, PIRATE [9] and RESOLVE [23] for density modification and RESOLVE [24], BUCCANEER [25] and ARP/wARP [10] for automated model building. ARP/wARP uses REFMAC [11] for iterative refinement. Within REFMAC, either the likelihood function restraining phases via Hendrickson-Lattman coefficients [12] or a multivariate likelihood SAD function [13] is used. To calculate FA values needed for substructure detection, crank interfaces with the programs SHELXC [14] or AFRO [15]. For setting up and preparing files, crank using programs from the CCP4 [16] suite, including SFTOOLS [17] and TRUNCATE [18]. Also, crank uses the Kantardjieff-Rupp algorithm [19] which performs a probabilistic Matthew's coefficient [20] calculation for estimating the the number of monomers in the asymmetric unit. To visualize the results produced by crank, an interface to COOT [26] is also available.

Crank can be run using its CCP4i [21] interface or via script using the program GCX [22]. Crank's only dependency to produce a density modified map is a licenced CCP4 version 5.99.x or later. If you would like to use the SHELX [13] programs, ARP/wARP [10], RESOLVE [23], [24] and/or BUCCANEER [25] within crank, you must have it installed on your system with the appropriate licence. If these programs do not appear in your path, they will not appear as options in the ccp4i interface.

RUNNING CRANK

Crank can be run either through its CCP4i interface or via script using the program GCX. Currently, the CCP4i interface has more options available. To see how to run crank via GCX, please consult the program's documentation available in the crank's distribution subdirectory programs/gcx/doc. To start the Crank CCP4i interface type the command:

ccp4i

Then, using the main CCP4i menu on the far left hand side of the interface, select "Experimental Phasing", then select "Automated Search & Phasing" and see "Crank - automated EP pipeline" (or within the "Program List", scroll down to "Crank" and click it).

Below, descriptions of the crank CCP4i fields are given.

CRANK CCP4i FIELDS

Title

A short descriptive title for the experiment to appear in the CCP4i task window

Type of experiment

Select your experiment between Single wavelength anomalous diffraction (SAD), Single isomorphous replacement with anomalous scattering (SIRAS), (Two, Three or Four wavelength MAD) 2W-MAD, 3W-MAD, or (Two, Three or Four wavelength MAD with native) 2W-MADN or 2W-MADN, 3W-MADN, 4W-MADN.

Input protein sequence

If you wish to input the protein sequence in pir format to use in automated model building and refinement and estimating the solvent content. Crank will then display the total number of amino acid residues. If you do not wish to input the protein sequence, unclick the button and input the number of protein residues per monomer.

DNA/RNA present

If you have DNA and/or RNA, click on the DNA/RNA button and input the number of nucleotides per monomer.

MTZ in

The name of your input MTZ file. At the moment, this must contain merged intensities or structure factor amplitudes from your experiment.

Input (Intensities/Amplitudes)

Choose whether you wish to input Intensities or Structure factor amplitudes.

Input R-free flag

By default, CRANK will create an R-free flag. Alternatively, you can specify an existing R-free column label present in the MTZ IN file.

MTZ out

The name of the MTZ file that will be outputted. The intermediate steps run by crank may also output MTZ file. See the section on INTERPRETING RESULTS for more information.

Crystal/Data set parameters

You will now have to input information on your substructure atom and the mtz columns for your data.

Substructure atom

Give your anomalous or heavy substructure atom. The name must correspond to an atom in CCP4's library file ($CLIBD/atomsf.lib).

Number of substructure atoms per monomer

Give the number of anomalous/heavy atoms expected per monomer. The total number of substructure atoms looked for (in the asymmetric unit) will be this number multiplied by the number given or obtained in the "number of monomers in the asu" field in the Required parameters section.

Data collected at CuKalpha wavelength (1.54A)

If your data was collected at CuKalpha wavelength, click this box and the f' and f" values for your atom will be obtained automatically. If your data was not collected at CuKalpha wavelength, input the f' and f" values. To get the best possible results, please give a reasonable value. If you only have the wavelength and did not measure the values by a florescence scan, you can use the CCP4 program CROSSEC to get an estimate.

IP+, SIGIP+ IP-, SIGIP- or IP, SIGIP

Input the mtz columns corresponding to your merged mtz file. If you have a significant anomalous signal, input the anomalous intensities.

(DERIVED PARAMETERS) Guess Overall B, solvent content (and number of monomers if no input is given)

Once the number of protein residues and/or nucleotides is given, crank will attempt to guess the solvent content, overall B-factor and number of monomers in the asymmetric unit. Crank obtains a first guess for the solvent content and number of monomers in the asymmetric unit by using the functional form proposed by Kantardjieff and Rupp [19] - these values will be filled in automatically. If you would like to see the Matthew's [20] coefficient, Kantardjieff-Rupp probability, and solvent content corresponding to a different number of monomers, simply input the number of monomers in the box, and re-click on the "Guess Overall B, solvent content..." button and the updated parameters will be shown.

Experimental pipeline

Start the pipeline with ... and end with ...

This option allows you to start or end the pipeline at a certain step.

If you choose to start at a step that requires inputting a substructure, and you would like to input a substructure in pdb format, the format of that pdb file should be the following:

HETATM    1  SE  HAT     1      25.284  28.195  17.180  1.00 33.96

OR

ATOM      1  SE  HAT     1      25.284  28.195  17.180  1.00 33.96

The fixed format for the columns agree with the pdb format, but column 3 has to be the name of your substructure that matches an atom in $CLIBD/atomsf.lib. See file gere.pdb in the test sub-directory of the main crank directory for an example.

The section allows you to choose the experimental pipeline you would like to perform. At the moment, five predefined pipelines are available:

Display individual program options

Click on this option if you wish to adjust any of the program options. If you click this option, you can remove all programs by clicking the "Clear All" button, and the experimental palette pipeline will be removed. It is possible to construct your own pipeline by selecting the program that you wish to use next with the "Next possible program:" option, followed by the "Add program" button. However, this should only be used by experts who know exactly what they are doing! If you do not wish to run the specified program listed at the end of the crank pipeline, the program can be removed with the "Edit list" and "Delete last item" option. You can also see the flow of information (ie. mtz columns, substructures, etc), by clicking on the "Show all pipeline input columns".

PROGRAM FIELDS

Below, a description of all the crank plugins as well as some of the more important modifiable options is given.

PREP

PREP is the crank plugin for using ctruncate [17] to calculate structure factor amplitudes from intensities and using scaleit [15] to relatively scale the data sets together. Most of the options given are self-explanatory.

AFRO

AFRO is the crank plugin for the AFRO [14] program to calculate FA values needed for substructure detection programs. Some of the important fields that can be modified to improve results in substructure detection are the following:

High resolution cutoff;
The default is to cut the data set to 0.5 plus the data sets highest resolution. Changing this value based on the graph of DANO/SANO for anomalous differences may improve results. (This graph is shown by afro/xloggraph.)
Exclude Reflections: FP < <nsigf> *SIGF DANO < <nsano> *SANO
Exclude reflections if FP < <nsigf> *SIGF. The default setting for <nsigf> is 2.0. Exclude reflections if DANO < <nsano> * SDANO. The default setting for <nsano> is 0.5. (DANO = abs(|F+| - |F-|) and SDANO is the standard deviation of DANO in measurement. Modifying this value can lead to a solution when a previous value has failed.

CRUNCH2

CRUNCH2 [2] is a substructure detection program using Karle-Hauptmann matrices.

Number of CRUNCH2 trials
Probably the first option to modify when substructure detection has failed is to increase the number of CRUNCH2 trials. The default value shown is 20.
Number of Patterson trials to generate
Sets the number of Patterson trials. Before a crunch2 run, a Patterson minimal function is calculated to generate trial solutions for crunch2. The default is to generate 150 starting Patterson trials. CRUNCH2 ranks all starting patterson solutions and runs the best values first.
Stop CRUNCH2 if a score is greater than
This field sets the minimum crunch2 figure of merit to stop the crunch2 run. The default is 1.0 - if this value is obtained by CRUNCH2, you can be fairly confident that you have obtained the correct substructure.
or if the highest score is <ndeviation> times the lowest score
<ndeviation> specifies another stopping criteria: CRUNCH2 is stopped if the highest score if a trial is <ndeviation> times greater than the lowest score. The default value for <ndeviation> is 1.75.

BP3

BP3 [4], [5] is a substructure phasing program. The "Fast phasing" option can be toggled on and off if you would like to have a quicker run of BP3. Fast phasing is the default for MAD phasing.

SOLOMON

SOLOMON's [6] interface can attempt to determine the correct hand, optimize the solvent content, and perform a density modification run.

DM

DM [7] is a density modification program. If you have a metallo-protein, it is probably optimal to unclick the "Histogram matching" option. Again, the interface to DM can be used to determine the correct hand, optimize the solvent content and, of course, perform a complete density modification run.

PIRATE

PIRATE [9] is a density modification program. The weight to apply to input phases is an option to change if you believe that the phases that you input are bias.

SHELXC

SHELXC [14] is the program to generate files (including FA values) needed for SHELXD and SHELXE.

High resolution cutoff <hires>
Specify the high resolution limit. The default is to set the limit to 0.5 Angstroms above the high resolution limit - unless the high resolution limit is lower than 2.5, in which case the limit is set to the lower of 3.0 Angstroms or the high resolution limit of the data.

SHELXD

SHELXD [3] can be used to determine substructures, if the SHELX suite is installed. Important options to consider are the following.

CC/weak threshold <nthreshold>
<nthreshold> is the minimum "weak" correlation coefficient needed to stop the shelxd run. The default is 30.
Num. trials <ntrials>
Sets the number of trials. The default is 500, but for more difficult problems with a weak signal, setting <ntrials> to a greater value (like 1000) can lead to a solution, when a solution was not found with 500.
Minimum distance between atoms <mind>
Set the minimum distance between substructure atoms. The crank default value is 3.5.

SHELXE

Crank also has an interface for the the density modification program SHELXE [8] which can be used to determine the correct hand, optimize the solvent content and perform a complete density modification run.

ARPWARP

ARPWARP is the crank plugin for the automated model building program ARP/wARP [10] with iterative refinement using REFMAC [11]

Autobuild cycles <nautobuild>
The number of autobuilding cycles in ARP/wARP (default is 10). Sometimes more than 10 cycles are needed to complete the model.
Include <target>
The target function to use for refinement in REFMAC. By default, the SAD likelihood function [13] is used for SAD experiments and the MLHL function [12] for non-SAD experiments. However, in the case where Hendrickson-Lattman coefficients are not available, the MLHL function can not be run (For example, in the SHELXC/D/E pipeline.) In this case, the SAD function is used for SAD and MAD experiments. We have found that the SAD function works optimally when the free set is used for scaling and sigmaa calculation. Conversely, the MLHL and (no phase restraints likelihood function) works best if the sigmaa calculation uses the working set. The above selections for the set of reflections to use in sigmaa calculations are the default settings in CRANK.

INTERPRETING RESULTS

Crank directory structure

Crank uses a hierarcherically structured directory system to store all the different runs of the various crystallographic programs. There are two types of directories under the Crank directory hierarchy, directories where crystallographic programs are run, and directories where information is collected.

The directories where programs are run in always start with a number followed by a dash and a program name as in "3-crunch2" or "4-bp3". The first number signifies what part in the pipeline the named program was run.

Inside these directories are where the various crystallographic programs are run and thus contain all the files produced by the run of the given program. These directories are constructed by crank, which first builds a shell script to run the program. This shell script is designed to be as close as possible to the example scripts generated by the program author. Crank copies all the requisite data files for the program into the run directory.

Then, Crank simply executes the shell script that it has built, timing the run and collecting results.

In cases where automated structure solution was not possible or to try to optimize results, obviously, identifying the step (ie. substructure detection or phasing) which either failed, or produced sub-optimal results is the first step. The best place to start to look if a particular step failed is to examine the individual programs documentation. Also, it may be useful to look at the Crank test system to give an indication of statistics from successful jobs and diagnostics (and suggestions) from jobs that failed with default settings.

PROBLEMS, SUGGESTIONS

Crank is still beta software, and we appreciate any suggestions or bug reports that can be emailed to crank@chem.leidenuniv.nl.

REFERENCES

  • [1] Ness, S.R., de Graaff, R.A.G., Abrahams, J.P. and Pannu, N.S. (2004) Structure, 12, 1753-1761.
  • [2] de Graaff, R.A.G., Hilge, M., van der Plas, J.L. and Abrahams, J.P. (2001) Acta Cryst., D57, 1857-1862.
  • [3] Schneider T.R., Sheldrick G.M. (2002) Acta Cryst., D58, 1772-1779.
  • [4] Pannu, N.S. and Read, R.J. (2004) Acta Cryst., D60, 22-27.
  • [5] Pannu, N.S., McCoy, A.J. and Read, R.J. (2003) Acta Cryst., D59, 1801-1808.
  • [6] Abrahams, J.P. and Leslie, A.G.W. (1996) Acta Cryst., D52, 30-42.
  • [7] Cowtan, K.D. (1994) Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography, 31, 34-38.
  • [8] Sheldrick, G.M. (2002) Z. Kristallogr., 217, 644-650.
  • [9] Cowtan, K. (2000) Acta Cryst., D56, 1612-1621.
  • [10] Perrakis, A., Morris, R.M. and Lamzin, V.S. (1999) Nat Struct Biol., 6, 458-463.
  • [11] Murshudov, G.N., Vagin, A.A. and Dodson, E.J. (1997) Acta Cryst., D53, 240-255.
  • [12] Pannu, N.S., Murshudov, G.N., Dodson, E.J. and Read, R.J. (1998) Acta Cryst., D54, 1285-1294.
  • [13] Skubak P, Murshudov G.N. and Pannu NS (2004) Acta Cryst., D60, 2196-2201.
  • [14] Sheldrick, G.M. SHELX suite of programs website
  • [15] Pannu, N.S. (unpublished).
  • [16] CCP4 (1994) Acta Cryst., D50, 760-763.
  • [17] Hazes, B. (unpublished).
  • [18] French, G.S. and Wilson, K.S. (1978) Acta. Cryst., A34, 517-534.
  • [19] Kantardjieff, K.A. and Rupp, B. (2003) Protein Science, 12, 1865-1871.
  • [20] Matthews, B.W. (1968) J.Mol.Biol, 33, 491-497.
  • [21] Potterton, E., Briggs, P.J., Turkenburg, M. and Dodson, E.J. (2003) Acta. Cryst., D59, 1131-1137.
  • [22] Pannu, N.S. (unpublished).
  • [23] Terwilliger, T.C. (2000) Acta Cryst., D56, 965-972.
  • [24] Terwilliger, T.C. (2003) Acta Cryst., D59, 38-44.
  • [25] Cowtan, K. (2006) Acta Cryst., D62, 1002-1011.
  • [26] Emsley, P. and Cowtan, K. (2004) Acta Cryst., D60, 2126-2132.

    Last modified: Tue Feb 17 11:39:19 CET 2009