PIRATE (CCP4: Supported Program)

NAME

pirate - Statistical Phase Improvement

SYNOPSIS

cpirate -mtzin-ref filename -mtzin-wrk filename -mtzout filename -colin-ref-fo colpath -colin-ref-hl colpath -colin-wrk-fo colpath -colin-wrk-hl colpath -colin-wrk-free colpath -colout colpath -colout-hl colpath -colout-fc colpath -ncycles cycles -weight-expllk weight -weight-mapllk weight -weight-ramp weight -resolution resolution/A -stats-radius radius/A,radius/A -skew-content factor,factor -auto-content -unbias -evaluate -strict-free -seed seed -verbose verbosity -stdin
[Keyworded input]

DESCRIPTION

'pirate' performs statistical phase improvement by classifying the electron density map by sparseness/denseness and order/disorder, with the aim of obtaining superior results to conventional solvent mask based methods without requiring knowledge of the solvent content.

The target distributions are generated by a simulation calculation using a known 'reference' structure for which calculated phases are available. The success of the method is dependent on the features of the reference structure matching those of the unsolved, 'work' structure. For the common case of a protein of mostly equal atoms (i.e. not a metalloprotein), a single reference structures can be used, with modifications automatically applied to the reference structure to match its features to the work structure.

HOW TO RUN PIRATE

To get the best results from 'pirate', you should allow it to modify the cell contents of the reference structure to match the contents of the work structure. To do this, simply leave the 'auto-content' option selected, and use a generic reference structure of medium solvent content, such as 1AJR.

Alternatively, you may run 'pirate' several times in 'evaluation' mode to select the best reference structure. These initial runs are for a single cycle only, and the output is ignored. Simply check the Free-E correlation in the log file to determine which reference structure gives the best results. Once you have identified the best reference structure, make a longer run (3-9 cycles) to produce an improved set of phases.

A set of reference structure may have been provided by whoever installed 'pirate' on your system, otherwise you can generate one using the 'Make ref structure' task. The structure 1AJR is good for typical protein problems at resolutions up to 1.85A. For exotic cases and high resolution cases you will have to provide your own reference structures. The 'Make ref structure' task requires ftp access to the EBI - if your local firewall prevents this, you will have to fetch the coordinate and structure factor files (pdbXXXX.ent.Z, rXXXXsf.ext.Z) using your preferred tool, and then run the task using the local files.

You may have to adjust the log-likelihood weights in some cases - particularly SAD, where the error estimates may be very poor. The Free-E correlation is less helpful for this purpose - look at the maps to decide if this is necessary.

INPUT/OUTPUT FILES

-mtzin-ref
Input 'reference' MTZ file. This contains the data for a known, reference structure. The required columns are F, sigF, and a set of Hendrickson-Lattman (HL) coefficients describing the calculated phases from the final model. Suitable reference structures can be constructed from the PDB using the makereference.csh script.
-mtzin-wrk
Input 'work' MTZ file. This contains the data for the unknown, work structure. The required columns are F, sigF, and a set of HL coefficients from a phasing program (experimental or molecular replacement). A Free-R flag may be provided. The HL distributions should be accurate - in my experience no current program fulfils this criterion, although Solve comes close. I have strong hopes for Phaser. Currently, the input HL coefficients are corrected using a user parameter.
-mtzout
Output MTZ file. This will contain the updated HL coefficients, and map coefficients for a best electron density map, including restored values for missing reflections.

KEYWORDED INPUT

See Note on keyword input.

-colin-ref-fo colpath

Observed F and sigma for reference structure. See Note on column paths.

-colin-ref-hl colpath

Hendrickson-Lattman coefficients for reference structure. If you do not have these, they can be generated using the accompanying chltofom program. See Note on column paths.

-colin-wrk-fo colpath

Observed F and sigma for work structure. See Note on column paths.

-colin-wrk-hl colpath

Hendrickson-Lattman coefficients for work structure. See Note on column paths.

-colin-wrk-free colpath

[Optional] Free-R flag for work structure. See Note on column paths.

-colout colpath

[Optional] Column group name for all output columns.

-colout-hl colpath

[Optional] Column group name for HL column group only.

-colout-fc colpath

[Optional] Column group name for map coefficient group only.

-ncycles cycles

[Optional] Number of cycles to perform. Default is 1 cycle, to produce a rigorous set of phase probability distributions and statistics. For a best map, run 3-5 cycles.

-weight-expllk weight

[Optional] Weight to apply to input HL coefficients. This is to correct for deficiencies in phasing programs. With next-generation phasing software, this parameter will be fixed at 1.0. Until then, try 0.5, and then 1.0 and 2.0 if results are poor.

-weight-mapllk weight

[Optional] Weight to apply to map likelihood. Default is 0.1. You may be able to optimise this using the Free-E correl. Find the weight which gives the best Free-E correl, and then divide it by 2 for your final run.

-weight-ramp weight

[Optional] Weight for the ramping up of the map likelihood. Defaults are 2.0 in normal mode and 1.0 in unbias mode.

-resolution resolution/A

[Optional] Resolution limit for the calculation. All data is truncated.

-stats-radius radius/A,radius/A

[Optional] Radius for local mean and standard deviation, used in classifying the map. Default 9.0, 3.0A. First value is initial value, second is final value for multi-cycle calculations.

-skew-content factor,factor

[Optional] Factors for skewing the content of dense and ordered density of the reference structure. +ve increases the specified region, -ve decreases. Useful range is -1...+1.

-auto-content

[Optional] Automatically refine values for content skewing to fit the reference structure to the work structure. This is slow but gives a better result.

-unbias

[Optional] Select '-unbias' to remove bias from molecular replacement models. This is a more sophisticated version of the 'dm' COMBINE WEIGHT option.

-evaluate

[Optional] Select '-evaluate' to run a quick evaluation calculation to test a reference structure.

-strict-free

[Optional] Select '-strict-free' to keep the free-set for every cycle of a multi-cycle calculation. This may lead to a worse map, but may be a good idea if you want to try and refine against the final output phases. A better option is to run only one cycle to produce phases for refinement, or more to make a map.

-seed seed

-verbose verbosity

Note on column paths:

When using the command line, MTZ columns are described as groups using a slash separated format including the crystal and dataset name. If your data was generated by another column-group using program, you can just specify the name of the group, for example '/native/peak/Fobs'. You can wildcard the crystal and dataset if the file does not contain any duplicate labels, e.g. '/*/*/Fobs'. You can also access individual non-grouped columns from existing files by giving a comma-separated list of names in square brackets, e.g. '/*/*/[FP,SIGFP]'.

Note on keyword input:

Keywords may appear on the command line, or by specifying the '-stdin' flag, on standard input. In the latter case, one keyword is given per line and the '-' is optional, and the rest of the line is the argument of that keyword if required, so quoting is not used in this case.

Reading the Output:

The program outputs a short list of statistics each cycle. The Free-E correlation is probably the most useful (larger is better). After the first cycle these may be biased in various ways. They are fairly useful for selecting a reference structure from a list of candidates or for selecting a radius. They can be used to control the likelihood weighting, but see the notes under the keyword for the appropriate protocol.

Problems:

AUTHOR

Kevin Cowtan, York.

SEE ALSO