Crank version 1.2.0 Documentation

NAME

crank - automated structure solution pipeline for SAD/MAD or SIRAS data.

DESCRIPTION

Crank [1] is a program to automate macromolecular structure determination for single or multiple-wavelength anomalous diffraction (SAD/MAD) or single isomorphous replacement (SIRAS) experiments. Crank interfaces with various crystallographic programs and is designed to allow both the automation of the structure determination process, but also allow the user to re-run and optmize results, if necessary.

This version of Crank has interfaces to the programs CRUNCH2 [2] and SHELXD [3] for substructure detection, BP3 [4], [5] for substructure phasing, SOLOMON [6], DM [7], SHELXE [8], PIRATE [9] and RESOLVE [23] for density modification and RESOLVE [24], BUCCANEER [25] and ARP/wARP [10] for automated model building. ARP/wARP uses REFMAC [11] for iterative refinement. Within REFMAC, either the likelihood function restraining phases via Hendrickson-Lattman coefficients [12] or a multivariate likelihood SAD function [13] is used. To calculate FA values needed for substructure detection, crank interfaces with the programs SHELXC [14] or AFRO [15]. For setting up and preparing files, crank using programs from the CCP4 [16] suite, including SFTOOLS [17] and TRUNCATE [18]. Also, crank uses the Kantardjieff-Rupp algorithm [19] which performs a probabilistic Matthew's coefficient [20] calculation for estimating the number of monomers in the asymmetric unit. To visualize the results produced by crank, an interface to COOT [26] is also available.

Crank can be run using its CCP4i [21] interface or via script using the program GCX [22]. Crank's only dependency to produce a density modified map is a licenced CCP4 version 5.99.x or later. If you would like to use the SHELX [13] programs, ARP/wARP [10], RESOLVE [23], [24] and/or BUCCANEER [25] within crank, you must have it installed on your system with the appropriate licence. If these programs do not appear in your path, they will not appear as options in the ccp4i interface.

INSTALLATION

The instructions given below also are in the INSTALL file in the distributed gzipped crank tar file.

Download the crank-XXXX-VV.tgz file for your required platform(s) from the following web site:

http://www.bfsc.leidenuniv.nl/software/crank

Untar the crank tar file

tar xzf crank-XXXXX-VV.tgz

Add the crank bin directory to your path

for csh or tcsh: setenv PATH $PATH/:/directory/to/crank/bin

for bash: export PATH=$PATH:/directory/to/crank/bin

Set the environment variable CRANK_120 to the main crank directory

for csh or tcsh: setenv CRANK_120 /directory/to/crank

for bash: export CRANK_120=/directory/to/crank

Install the CCP4i [21] interface using the file "crank.tar.gz" in the crank/ccp4i directory with the following steps.

  • Start ccp4i (as the CCP4 system administrator)
  • From the "System Administration" menu, choose the "Install Tasks" option.
  • From the "Task archive" panel, select the "crank.tar.gz" located in the crank/ccp4i directory.
  • Click on "Apply" and restart ccp4i - crank's version 1.2.0 interface will overwrite crank's interface distributed within ccp4!
  • RUNNING CRANK

    Crank can be run either through its CCP4i interface or via script using the program GCX. Currently, the CCP4i interface has more options available. To see how to run crank via GCX, please consult the program's documentation available in the crank's distribution subdirectory programs/gcx/doc. To start the Crank CCP4i interface type the command:

    ccp4i

    Then, using the main CCP4i menu on the far left hand side of the interface, select "Experimental Phasing", then scroll down to "Crank - automated package" (or within the "Program List", scroll down to "Crank" and click it).

    Below, descriptions of the crank CCP4i fields are given.

    CRANK CCP4i FIELDS

    Title

    A short descriptive title for the experiment to appear in the CCP4i task window

    MTZ in

    The name of your input MTZ file. At the moment, this must contain merged intensities or structure factor amplitudes from your experiment.

    MTZ out

    The name of the MTZ file that will be outputted. The intermediate steps run by crank may also output MTZ file. See the section on INTERPRETING RESULTS for more information.

    Input (Intensities/Amplitudes)

    Choose whether you wish to input Intensities or Structure factor amplitudes.

    Setup Experiment

    Select your experiment between SAD, SIRAS, 2WMAD (two wavelength MAD), 3WMAD or 4WMAD.

    Input protein sequence

    If you wish to input the protein sequence in pir format to use in automated model building and refinement and estimating the solvent content. If you do not wish to input the protein sequence, unclick the button and input the number of protein residues per monomer.

    DNA/RNA present

    If you have DNA and/or RNA, click on the DNA/RNA button and input the number of nucleotides per monomer.

    Crystal/Data set parameters

    You will now have to input information on your substructure atom and the mtz columns for your data.

    Substructure atom

    Give your anomalous or heavy substructure atom. The name must correspond to an atom in CCP4's library file ($CLIBD/atomsf.lib).

    Number of substructure atoms per monomer

    Give the number of anomalous/heavy atoms expected per monomer. The total number of substructure atoms looked for (in the asymmetric unit) will be this number multiplied by the number given or obtained in the "number of monomers in the asu" field in the Required parameters section.

    Data collected at CuKalpha wavelength (1.54A)

    If your data was collected at CuKalpha wavelength, click this box and the f' and f" values for your atom will be obtained automatically. If your data was not collected at CuKalpha wavelength, input the f' and f" values. To get the best possible results, please give a reasonable value. If you only have the wavelength and did not measure the values by a florescence scan, you can use the CCP4 program CROSSEC to get an estimate.

    IP+, SIGIP+ IP-, SIGIP- or IP, SIGIP

    Input the mtz columns corresponding to your merged mtz file. If you have a significant anomalous signal, input the anomalous intensities.

    Solvent/Unit cell content

    Use data set <ndata> from crystal number <nxtal> in density modification and model building.

    This flag allows use to choose the data set you wish to use for in density modification and model building. It is usually optimal to use the data set with the highest resolution.

    Guess Overall B, solvent content (and number of monomers if no input is given)

    Once the number of protein residues and/or nucleotides is given, crank will attempt to guess the solvent content, overall B-factor and number of monomers in the asymmetric unit. Crank obtains a first guess for the solvent content and number of monomers in the asymmetric unit by using the functional form proposed by Kantardjieff and Rupp [19] - these values will be filled in automatically. If you would like to see the Matthew's [20] coefficient, Kantardjieff-Rupp probability, and solvent content corresponding to a different number of monomers, simply input the number of monomers in the box, and re-click on the "Guess Overall B, solvent content..." button and the updated parameters will be shown.

    Experimental pipeline

    Start the pipeline with ... and end with ...

    This option allows you to start or end the pipeline at a certain step.

    If you choose to start at a step that requires inputting a substructure, and you would like to input a substructure in pdb format, the format of that pdb file should be the following:

    HETATM    1  SE  HAT     1      25.284  28.195  17.180  1.00 33.96
    
    OR
    
    ATOM      1  SE  HAT     1      25.284  28.195  17.180  1.00 33.96
    

    The fixed format for the columns agree with the pdb format, but column 3 has to be the name of your substructure that matches an atom in $CLIBD/atomsf.lib. See file gere.pdb in the test sub-directory of the main crank directory for an example.

    The section allows you to choose the experimental pipeline you would like to perform. At the moment, five predefined pipelines are available:

    Display results with coot

    If you check this option, coot will open after the crank job is finished and display your substructure, map and pdb (if applicable). If you don't see this option, then coot is not in your PATH.

    Pipeline:

    The "CRUNCH2/BP3/SOLOMON/ARPWARP" (default) option gives the following sequence of programs:

    PREP -> AFRO -> CRUNCH2 -> BP3 -> SOLOMON -> ARP/wARP + REFMAC.

    "SHELXCDE/BP3/SOLOMON/ARPWARP" gives the following pipeline:

    PREP -> SHELXC -> SHELXD -> SHELXE -> BP3 -> SOLOMON -> ARP/wARP + REFMAC.

    "SHELXCDE/ARPWARP" gives the following pipeline (suggested by George Sheldrick):

    PREP -> SHELXC -> SHELXD -> SHELXE -> ARP/wARP + REFMAC.

    The "CRUNCH2/BP3/RESOLVEDM/MB" option gives the following pipeline:

    PREP -> AFRO -> CRUNCH2 -> BP3 -> RESOLVEDM -> RESOLVEMB.

    The "CRUNCH2/BP3/SOLOMON/PIRATE/BUCCANEER" option gives the following pipeline:

    PREP -> AFRO -> CRUNCH2 -> BP3 -> SOLOMON -> PIRATE -> ARP/wARP + BUCCANEER.

    The "CRUNCH2/BP3/SOLOMON/PIRATE/BUCCANEER" option gives the following pipeline:

    PREP -> AFRO -> CRUNCH2 -> BP3 -> SOLOMON -> PIRATE -> ARP/wARP + BUCCANEER.

    For the first, second, and fourth pipelines, the density modification programs (ie. SOLOMON, SHELXE, or DM) will attempt to determine the correct hand, an optimal solvent content and a final density modification run. For the second pipeline, SHELXE will attempt to determine the correct hand and optimal solvent content, while SOLOMON performs the final density modification run.

    Display individual program options

    Click on this option if you wish to adjust any of the program options. If you click this option, you can remove all programs by clicking the "Clear All" button, and the experimental palette pipeline will be removed. It is possible to construct your own pipeline by selecting the program that you wish to use next with the "Next possible program:" option, followed by the "Add program" button. However, this should only be used by experts who know exactly what they are doing! If you do not wish to run the specified program listed at the end of the crank pipeline, the program can be removed with the "Edit list" and "Delete last item" option. You can also see the flow of information (ie. mtz columns, substructures, etc), by clicking on the "Show all pipeline input columns".

    PROGRAM FIELDS

    Below, a description of all the crank plugins as well as some of the more important modifiable options is given.

    PREP

    PREP is the crank plugin for using truncate [17] to calculate structure factor amplitudes from intensities and using scaleit [15] to relatively scale the data sets together. Most of the options given are self-explanatory.

    AFRO

    AFRO is the crank plugin for the AFRO [14] program to calculate FA values needed for substructure detection programs. Some of the important fields that can be modified to improve results in substructure detection are the following:

    High resolution cutoff;
    The default is to cut the data set to 0.5 plus the data sets highest resolution. Changing this value based on the graph of DANO/SANO for anomalous differences may improve results. (This graph is shown by afro/xloggraph.)
    Exclude Reflections: FP < <nsigf> *SIGF DANO < <nsano> *SANO
    Exclude reflections if FP < <nsigf> *SIGF. The default setting for <nsigf> is 2.0. Exclude reflections if DANO < <nsano> * SDANO. The default setting for <nsano> is 0.5. (DANO = abs(|F+| - |F-|) and SDANO is the standard deviation of DANO in measurement. Modifying this value can lead to a solution when a previous value has failed.

    CRUNCH2

    CRUNCH2 [2] is a substructure detection program using Karle-Hauptmann matrices.

    Number of CRUNCH2 trials
    Probably the first option to modify when substructure detection has failed is to increase the number of CRUNCH2 trials. The default value shown is 20.
    Number of Patterson trials to generate
    Sets the number of Patterson trials. Before a crunch2 run, a Patterson minimal function is calculated to generate trial solutions for crunch2. The default is to generate 150 starting Patterson trials. CRUNCH2 ranks all starting patterson solutions and runs the best values first.
    Stop CRUNCH2 if a score is greater than
    This field sets the minimum crunch2 figure of merit to stop the crunch2 run. The default is 1.0 - if this value is obtained by CRUNCH2, you can be fairly confident that you have obtained the correct substructure.
    or if the highest score is <ndeviation> times the lowest score
    <ndeviation> specifies another stopping criteria: CRUNCH2 is stopped if the highest score if a trial is <ndeviation> times greater than the lowest score. The default value for <ndeviation> is 1.75.

    BP3

    BP3 [4], [5] is a substructure phasing program. The "Fast phasing" option can be toggled on and off if you would like to have a quicker run of BP3. Fast phasing is the default for MAD phasing.

    SOLOMON

    SOLOMON's [6] interface can attempt to determine the correct hand, optimize the solvent content, and perform a density modification run.

    DM

    DM [7] is a density modification program. If you have a metallo-protein, it is probably optimal to unclick the "Histogram matching" option. Again, the interface to DM can be used to determine the correct hand, optimize the solvent content and, of course, perform a complete density modification run.

    PIRATE

    PIRATE [9] is a density modification program. The weight to apply to input phases is an option to change if you believe that the phases that you input are bias.

    SHELXC

    SHELXC [14] is the program to generate files (including FA values) needed for SHELXD and SHELXE.

    High resolution cutoff <hires>
    Specify the high resolution limit. The default is to set the limit to 0.5 Angstroms above the high resolution limit - unless the high resolution limit is lower than 2.5, in which case the limit is set to the lower of 3.0 Angstroms or the high resolution limit of the data.

    SHELXD

    SHELXD [3] can be used to determine substructures, if the SHELX suite is installed. Important options to consider are the following.

    CC/weak threshold <nthreshold>
    <nthreshold> is the minimum "weak" correlation coefficient needed to stop the shelxd run. The default is 30.
    Num. trials <ntrials>
    Sets the number of trials. The default is 500, but for more difficult problems with a weak signal, setting <ntrials> to a greater value (like 1000) can lead to a solution, when a solution was not found with 500.
    Minimum distance between atoms <mind>
    Set the minimum distance between substructure atoms. The crank default value is 3.5.

    SHELXE

    Crank also has an interface for the the density modification program SHELXE [8] which can be used to determine the correct hand, optimize the solvent content and perform a complete density modification run.

    ARPWARP

    ARPWARP is the crank plugin for the automated model building program ARP/wARP [10] with iterative refinement using REFMAC [11]

    Autobuild cycles <nautobuild>
    The number of autobuilding cycles in ARP/wARP (default is 10). Sometimes more than 10 cycles are needed to complete the model.
    Include <target>
    The target function to use for refinement in REFMAC. By default, the SAD likelihood function [13] is used for SAD experiments and the MLHL function [12] for non-SAD experiments. However, in the case where Hendrickson-Lattman coefficients are not available, the MLHL function can not be run (For example, in the SHELXC/D/E pipeline.) In this case, the SAD function is used for SAD and MAD experiments. We have found that the SAD function works optimally when the free set is used for scaling and sigmaa calculation. Conversely, the MLHL and (no phase restraints likelihood function) works best if the sigmaa calculation uses the working set. The above selections for the set of reflections to use in sigmaa calculations are the default settings in CRANK.

    INTERPRETING RESULTS

    Crank directory structure

    Crank uses a hierarcherically structured directory system to store all the different runs of the various crystallographic programs. There are two types of directories under the Crank directory hierarchy, directories where crystallographic programs are run, and directories where information is collected.

    The directories where programs are run in always start with a number followed by a dash and a program name as in "3-crunch2" or "4-bp3". The first number signifies what part in the pipeline the named program was run.

    Inside these directories are where the various crystallographic programs are run and thus contain all the files produced by the run of the given program. These directories are constructed by crank, which first builds a shell script to run the program. This shell script is designed to be as close as possible to the example scripts generated by the program author. Crank copies all the requisite data files for the program into the run directory.

    Then, Crank simply executes the shell script that it has built, timing the run and collecting results.

    In cases where automated structure solution was not possible or to try to optimize results, obviously, identifying the step (ie. substructure detection or phasing) which either failed, or produced sub-optimal results is the first step. The best place to start to look if a particular step failed is to examine the individual programs documentation. Also, it may be useful to look at the Crank test system to give an indication of statistics from successful jobs and diagnostics (and suggestions) from jobs that failed with default settings.

    PROBLEMS, SUGGESTIONS

    Crank is still beta software, and we appreciate any suggestions or bug reports that can be emailed to crank@chem.leidenuniv.nl.

    REFERENCES

  • [1] Ness, S.R., de Graaff, R.A.G., Abrahams, J.P. and Pannu, N.S. (2004) Structure, 12, 1753-1761.
  • [2] de Graaff, R.A.G., Hilge, M., van der Plas, J.L. and Abrahams, J.P. (2001) Acta Cryst., D57, 1857-1862.
  • [3] Schneider T.R., Sheldrick G.M. (2002) Acta Cryst., D58, 1772-1779.
  • [4] Pannu, N.S. and Read, R.J. (2004) Acta Cryst., D60, 22-27.
  • [5] Pannu, N.S., McCoy, A.J. and Read, R.J. (2003) Acta Cryst., D59, 1801-1808.
  • [6] Abrahams, J.P. and Leslie, A.G.W. (1996) Acta Cryst., D52, 30-42.
  • [7] Cowtan, K.D. (1994) Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography, 31, 34-38.
  • [8] Sheldrick, G.M. (2002) Z. Kristallogr., 217, 644-650.
  • [9] Cowtan, K. (2000) Acta Cryst., D56, 1612-1621.
  • [10] Perrakis, A., Morris, R.M. and Lamzin, V.S. (1999) Nat Struct Biol., 6, 458-463.
  • [11] Murshudov, G.N., Vagin, A.A. and Dodson, E.J. (1997) Acta Cryst., D53, 240-255.
  • [12] Pannu, N.S., Murshudov, G.N., Dodson, E.J. and Read, R.J. (1998) Acta Cryst., D54, 1285-1294.
  • [13] Skubak P, Murshudov G.N. and Pannu NS (2004) Acta Cryst., D60, 2196-2201.
  • [14] Sheldrick, G.M. SHELX suite of programs website
  • [15] Pannu, N.S. (unpublished).
  • [16] CCP4 (1994) Acta Cryst., D50, 760-763.
  • [17] Hazes, B. (unpublished).
  • [18] French, G.S. and Wilson, K.S. (1978) Acta. Cryst., A34, 517-534.
  • [19] Kantardjieff, K.A. and Rupp, B. (2003) Protein Science, 12, 1865-1871.
  • [20] Matthews, B.W. (1968) J.Mol.Biol, 33, 491-497.
  • [21] Potterton, E., Briggs, P.J., Turkenburg, M. and Dodson, E.J. (2003) Acta. Cryst., D59, 1131-1137.
  • [22] Pannu, N.S. (unpublished).
  • [23] Terwilliger, T.C. (2000) Acta Cryst., D56, 965-972.
  • [24] Terwilliger, T.C. (2003) Acta Cryst., D59, 38-44.
  • [25] Cowtan, K. (2006) Acta Cryst., D62, 1002-1011.
  • [26] Emsley, P. and Cowtan, K. (2004) Acta Cryst., D60, 2126-2132.

    Last modified: Tue Jan 16 12:49:57 CET 2007