DMMULTI (CCP4: Supported Program)

NAME

dmmulti - multi-xtal density modification package, release 0.9, 4/5/98

SYNOPSIS

dmmulti HKLIN1 foo1.mtz [HKLIN2 ...] HKLOUT1 bar1.mtz [HKLOUT2 ...]
[SOLIN1 foosol1.msk [SOLIN2 ...]] [SOLOUT1 barsol1.msk [SOLOUT2 ...]]
[MSKIN1 foomsk1.msk [ MSKIN2 ...]]
[Keyworded input]

REFERENCE

K. Cowtan (1994), Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography, 31, p34-38.

DESCRIPTION

'dmmulti' is a package which applies real space constraints based on known features of a protein electron density map in order to improve the approximate phasing obtained from experimental sources. The program may be applied to data from one or more crystal forms simultaneously. Various information can be applied, including such diverse elements as the following (see keyword MODE):

SOLV
Solvent flattening (8)
HIST
Histogram mapping (9)
SKEL
Skeletonisation (1,7) (alpha release)
AVER
NCS averaging, including multi-crystal averaging (2,6)

A discriminator analogous to the crystallographic free-R factor (3) plays an important part in the procedure, providing a good indication of the effectiveness of a particular density modification calculation, and also an accurate method for determining weights for new phases calculated by the procedure. This avoids the problems of over-consistency and overestimated weights which could arise in earlier density modification procedures. Note that the dm-free-R is not truly a free-R factor since it is impossible to completely isolate a set of reflections: all structure factor magnitudes are fundamentally interrelated through the density constraints in real space.

The program can either use a free-R set from the mtz file, or generate its own set internally. It is also possible to recycle the calculation, performing the density modification once or more times with different free-R sets, and then once with no free-R set but using the information obtained from the free-R cycles. This has been found to give a slight improvement in the overall results.

Calculation of scale and B-factor for the data are automatic. This is performed by comparison with an empirically derived database of map variance at different resolutions, and is more reliable than the conventional Wilson plot.

Non-crystallographic symmetry averaging can be performed for both proper and improper symmetries, and different NCS averaging operations can be applied to different parts of the protein (Thanks to Dave Schuller for his help with this). Spectral B-spline interpolation is used for fast calculation on a low resolution grid; this has been developed by Dr. Eric H. Grosse.

Multi-crystal averaging can be performed between phased and/or unphased forms. The calculation is performed efficiently and entirely in-core.

Skeletonisation is by the core-tracing algorithm of Swanson (7). This is faster than Greer's algorithm and allows adjustment of the skeletonisation parameters without recalculating the skeleton. As a result the skeletonisation calculation is rendered largely automatic.

Operation is by standard keyworded card input. Input masks may be on any grid and axis order, however if the mask grid is too fine the program may run out of space to store it.

INPUT FILES

HKLINi

Input mtz file for i'th crystal form - This should contain the conventional (CCP4) asymmetric unit of data (see CAD).

[SOLINi]

Input solvent mask for i'th crystal form - This overrides the automatic Wang mask determination. The input mask can have any grid and axis ordering, and may have any extent from the protein region of a single asymmetric unit to the whole cell.

[MSKINj]

Input averaging masks for j'th domain - These are used with the AVER option. The input masks can have any grid or axis ordering, and should cover a single monomer or domain, however correct results will still be obtained (more slowly) if the mask covers a proper symmetry related multimer. 'dmmulti' does not perform overlap removal.

OUTPUT FILES

HKLOUTi

Output mtz file for i'th crystal form.

[SOLOUTi]

Output solvent mask for i'th crystal form - This will be on the program grid with default axis order, and will cover the whole unit cell.

MAJOR KEYWORDS

Input is keyworded. Available keywords are:

AVERAGE, GRID, LABIN, LABOUT, MODE, NCYCLE, RESOLUTION, SCALE, SCHEME, SKEL, SOLC, WANG, XTAL.

(MODE and SOLC are compulsory)

XTAL <xtal>

Select the <xtal>'th crystal form. Keywords following the keyword will apply to this crystal form. Any keywords before the first XTAL card apply to form 1.

MODE [SOLV] [HIST] [AVER] [SKEL]

Select the calculation to be performed:

SOLV
= Solvent flattening
HIST
= Histogram mapping
AVER
= Non-crystallographic symmetry averaging
SKEL
= Skeletonisation

SOLC <solc> [ MASK <solvfrac> <protfrac> ] [ MEAN <solvval> <protval> ]

<solc>
= solvent content. ALWAYS INPUT THE CORRECT SOLVENT CONTENT HERE TO ENSURE CORRECT SCALING. 0.0=all protein, 1.0=all solvent.
MASK
- used to set different mask volumes to the above for histogram matching and solvent flattening.
<solvfrac> = fraction of cell to be masked as solvent.
<protfrac> = fraction of cell to be masked as protein.
If <solvfrac>+<protfrac> < 1.0 then there will be a buffer region between solvent and protein which is neither histogram matched nor solvent flattened. This feature is provided by popular demand, but makes things worse in most of my test cases.
MEAN
- used to set mean density for solvent and protein regions. This affects scaling and density modification.
<solvval> = mean density in solvent region.
<protval> = mean density in protein region.
(defaults 0.32, 0.43 electrons per cubic angstrom)

RESOLUTION <rmin> <rmax>

Resolution range of reflections to be included in the calculation. By the end of the calculation all the reflections in this range will be included, however at the start only a subset are used, chosen on the basis of the SCHEME card (default is the whole range of the input mtz file).

NCYCLE <ncycle> [ FREE <ncross> ]

Number of cycles of phase extension to perform (defaults <ncycle>=10 <ncross>=1).

<ncycle>
= Number of cycles over which to perform phase extension. Use 10 cycles for a quick result, try more (20-100) but check the free-R factor.
<ncross>
= Number of times each step is performed to provide statistics for the free-R and phase weighting.
For <ncross>=1 a changing random set of reflections are omitted each cycle for the free-R factor.
For <ncross>=2 a fixed set is chosen (using the free-R flag if available) and omitted for the free-R factor, then the cycle is run a second time using all the reflections.
For <ncross> > 2 (<ncross>-1) multiple free-R sets are generated, then on the <ncross>-th cycle all reflections are included.

The total time taken is proportional to the product of these two values. Use <ncross> = 1 for large structures where the time becomes a significant factor, otherwise use <ncross> = 2. Only use <ncross> > 2 for small structures where the statistics are particularly poor (< 5000 reflections).

In the case of a multi-crystal calculation only one NCYCLE card is allowed, which applies to all forms.

SCHEME AUTO | RES | MAG | FOM [ [ FROM <res> ] | [ FRAC <frac> ] ]

RES
- perform phase extension in resolution steps, starting with the low resolution data.
MAG
- perform phase extension in magnitude steps, starting with the largest reflections.
FOM
- perform phase extension in FOM steps, starting with the best phased data.
AUTO
- perform phase extension using a combination of the above chosen on the basis of what the data set looks like. This option will also pick a reasonable value for <frac>.
FRAC <frac>
- fraction of the input data to use as a starting set.
FROM <res>
- sets <frac> to the fraction of the data within a resolution sphere radius <res>.

(default: AUTO)

LABIN FP=.. SIGFP=.. [PHIO=.. FOMO=..] [HLA=.. HLB=.. HLC=.. HLD=..] [PHIDM=.. FOMDM=..] [FREE=..]

Normally just the first four columns (FP,SIGFP,PHIO,FOMO) are input. However if you have Hendrickson-Lattman coefficients you may want to input these to the program as well (the difference is marginal except for SIR data). If you want to start from the end of a previous density modification calculation then the PHIDM, FOMDM columns are used.

For multi-crystal averaging, if a crystal form is unphased the PHIO and FOMO columns may be omitted. There should be some sort of phases for at least one form.

FP
= F magnitude
SIGFP
= standard deviation, 0 for unmeasured
PHIO
= best initial phase estimate
FOMO
= weight attached to PHIO
HLA-HLD
= Hendrickson Lattman coefficients
PHIDM
= phase from previous density modification calculation to use as starting value
FOMDM
= weight from previous density modification calculation to use as starting value
FREE
= free-R flag (only used if ncross>1)

LABOUT PHIDM=.. FOMDM=.. [FCDM=.. PHICDM=..]

Normally just the first two columns are output. Don't use the other two unless you are a very clever person.

PHIDM
= modified phase
FOMDM
= weight attached to PHIDM
FCDM
= F from final modified map before phase recombination
PHICDM
= Phase from final modified map before recombination

OTHER KEYWORDS

SKEL [ LENGTH <joinlen> <endlen> ] [ BFAC <bfac> ] [ EVERY <nskl> ]

Perform iterative skeletonisation on the map. Cycles of skeletonisation are interspersed with cycles of conventional density modification (defaults <joinlen>=6.0 <endlen>=6.0 <bfac>=45 <nskl>=3).

<joinlen>
= length of skeleton in Angstrom/residue to generate between density peaks.
<endlen>
= length of skeleton in Angstrom/residue to generate in 'trailing ends'.
<bfac>
= temperature factor to apply to the sharpened map before skeletonisation.
<nskl>
= apply skeletonisation instead of every <nskl>-th density modification cycle.
See also dm_skeletonisation.

AVERAGE [DOMAIN <domn>} [ REFINE [ STEP <dr> <dphi> ] [ EVERY <nref> ]]

Set a NCS symmetry averaging operator. This card is followed by one rotation/translation matrices on subsequent lines in either CCP4 or O/RAVE format (defaults <dr>=0.5 A, <dphi>=2.5 degrees, <nref>=3).

CCP4 Formats (see also lsqkab)
ROTA EULER <alpha> <beta> <gamma> (Euler angles)
TRAN <t1> <t2> <t3>
or
ROTA POLAR <omega> <phi> <kappa> (Polar angles)
TRAN <t1> <t2> <t3>
or
ROTA MATRIX <r11> <r12> <r13> <r21> <r22> <r23> <r31> <r32> <r33>
TRAN <t1> <t2> <t3>
O/RAVE Format
OMAT
r11   r21   r31    (note that the rotation matrix is
r12   r22   r32    transposed with respect to CCP4
r13   r23   r33    or conventional matrix format)
t1     t2      t3     
where
x' = r11 x + r12 y + r13 z + t1
y' = r21 x + r22 y + r23 z + t2
z' = r31 x + r32 y + r33 z + t3

These are the operations which map the density in the region covered by the input mask onto the appropriate regions in the current crystal form. The first operator must be the identity matrix. The mask is input in CCP4 mask (map mode 0) format on the input file label MSKIN1, and should cover just one monomer or averaging domain, NOT the whole unit cell. The mask grid need not agree with the program grid.

If you want to apply different ncs operations to different domains of the protein, then each AVER card should contain a DOMAIN card to indicate which to domain this operator applies.

The REF, STEP and EVERY cards will enable refinement of the ncs rotation matrices between averaging cycles. The REF card enables the refinement of a particular set of NCS parameters. Note that the STEP card allows different refinement step sizes can be used for different domains, however all but one EVERY card will be ignored. The refined matrices will be written out at the end of the log file.

<dr>
= step size for refinement of positional parameters in Angstrom.
<dphi>
= step size for refinement of rotational parameters in degrees.
<nref>
= the number of phase extension cycles between each parameter refinement.

See also dm_ncs_averaging

GRID <nx> <ny> <nz>

Set the grid for the calculation. You may want to do this if you want to include your own mask or dump a map or mask (defaults: minimum efficient factors above Nyquist spacing).

WANG <radius> <mode> [ LIMITS <rhomin> <rhomax> ]

Set the averaging radius and mode for calculating the solvent mask (defaults: <radius>=8.0 <mode>=1 <rhomin>=0.32 <rhomax>=2.0 e/A^3).

<radius>
= radius of averaging sphere (Angstroms)
<mode>
= 1: Use weighting scheme w=1-(r/R) (Wang's method)
= 2: Use weighting scheme w=1-(r/R)**2

Heavy atoms can bias the mask calculation procedure, resulting in a mask of spheres around the heavy atom sites. The LIMITS card can be used to set the values at which the electron density is truncated before smoothing. To truncate heavy atoms set <rhomax> to the maximum electron density due to non-heavy atoms at the appropriate resolution.

SCALE <scale> <bfac>

Override internal scaling and scale input data by F^2 = <scale> * exp (<bfac> * s / 2.0) * F^2 Scaling is critical to histogram mapping and Sayre's equation. In some cases you may want to override the B-factor, but run without this card first, and consider long and hard before changing scale.

LOOKING AT YOUR OUTPUT

Look at the free-R factor: but you will have to disentangle the output for the different crystal forms.

The script 'multilog' can be used to roughly separate those portions of the output dealing with different crystal forms. Type:

> multilog name-of-your-dmmulti-logfile

CHANGES FROM 'dm'

The XTAL keyword for separating keywords for different forms is new.

The format of the AVER keyword is consistent with dm version 1.8 and later

There are now multiple input and output reflection and solvent masks for the various forms.

Only the last NCYC or EVERY keyword in the command file will have any effect, since cycles must be synchronised across the different forms. Only the last REF or STEP keyword in any crystal form will have an effect and will apply for all matrices in that form.

COMMON PROBLEMS

Refinement of averaging operators only works when the first operator given for each domain is the identity. This restriction does not apply when averaging without refining the operators.

Check the averaging correlation on the first cycle: this is a strong indication of whether the mask and matrices have been correctly determined.

Averaging operators must be FROM the masked region TO the copy in the unit cell. All averaging operators are defined in orthogonal coordinates using the conventional CCP4/Uppsala axis conventions.

AUTHOR

Kevin D. Cowtan, Department of Chemistry, University of York
email: cowtan@ysbl.york.ac.uk

REFERENCES

  1. Baker D., Bystroff C., Fletterick R., Agard D. (1994) Acta Cryst D49 429-439
  2. Bricogne, G. (1974) Acta Cryst A30 395-405
  3. Brunger, A. T. (1992) Nature 355, 472-474
  4. Cowtan K. D., Main, P. (1993) Acta Cryst D49 148-157
  5. Sayre, D. (1974) Acta Cryst A30 180-184
  6. Schuller D. (1996) Acta Cryst D52 425-434
  7. Swanson, S. (1994) Acta Cryst D50 695-708
  8. Wang, B. C. (1985) Methods in Enzymology 115, 90-112
  9. Zhang, K. Y. J., Main P. (1990) Acta Cryst A46 377-381

EXAMPLES

[ a simple solvent/histogram calculation ]

dmmulti                                 \
        hklin gmto.mtz                  \
        hklout gmtodm.mtz               \
        histlib dm/hist.lib             \
        << 'my-data'
SOLC 0.35
MODE SOLV HIST
NCYCLE 10
LABIN FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM
LABOUT PHIDM=PHI1 FOMDM=W1
'my-data'

[ a better solvent/histogram calculation, ]
[ takes 2x as long, uses fixed free-R set ]
[ starts at 3.0A and extends from there ]

dmmulti                                 \
        hklin gmto.mtz                  \
        hklout gmtodm.mtz               \
        histlib dm/hist.lib             \
        << 'my-data'
SOLC 0.35
MODE SOLV HIST
NCYCLE 10 FREE 2
SCHEME RES FROM 3.0
LABIN FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM FREE=FreeR_flag
LABOUT PHIDM=PHI1 FOMDM=W1
'my-data'

[ a two fold averaging calculation with ]
[ two domains and refinement of the 2nd ]
[ set of averaging matrices. ]

dmmulti \
 hklin hpattj.mtz \
 hklout dm1.mtz \
 mskin1 cwnads.mask \
 mskin2 cwglobs.mask \
 histlib /usr/people/schuller/dm/hist.lib \
<< 'EOF-dm'
SOLC 0.57
MODE SOLV HIST AVER
NCYCLE 40
AVERAGE DOMAIN 1
OMAT
 1.0 0.0 0.0
 0.0 1.0 0.0
 0.0 0.0 1.0
 0.0 0.0 0.0
AVERAGE DOMAIN 1
OMAT
    -0.71389002    -0.69492584     0.08611962
    -0.69635397     0.69129372    -0.19136506
     0.07357326    -0.19652288    -0.97735721
   115.37364197    54.98566055    67.00005341
AVERAGE DOMAIN 2 REFINE
OMAT
 1.0 0.0 0.0
 0.0 1.0 0.0
 0.0 0.0 1.0
 0.0 0.0 0.0
AVERAGE DOMAIN 2 REFINE
OMAT
     0.75830859     0.65183645     0.00883542
     0.65189570    -0.75824565    -0.00975925
     0.00033828     0.01316060    -0.99991322
    17.30371666   -47.10081482    68.99727631
LABIN FP=FP SIGFP=SIGFP PHIO=PHIml FOMO=FOMml -
HLA=HLA HLB=HLB HLC=HLC HLD=HLD
LABOUT PHIDM=PHIDM FOMDM=FOMDM
'EOF-dm'

[ a two crystal averaging calculation ]
[ where a single domain is being averaged. ]
[ There is no ncs within either form. ]

dmmulti \
  hklin1  hkl/gmtomir.mtz  hklin2 hkl/gmtmmir.mtz  \
  hklout1 dmgmto.mtz       hklout2 dmgmtm.mtz      \
  mskin1  gmto.msk <<+

NCYC 10

XTAL 1
SOLC 0.35
MODE SOLV HIST AVER
AVER REFI
ROTA POLAR 0 0 0
TRAN 0 0 0
LABIN FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM

XTAL 2
SOLC 0.41
MODE SOLV HIST AVER
AVER REFI
ROTA MATR  0.74198  0.34530  0.57466  0.52980  0.22324 -0.81821 -0.41082  0.91155 -0.01730
TRAN -27.92476 -10.49614 -11.78758
LABIN FP=FP SIGFP=SIGFP

END
+

[ a three crystal averaging calculation ]
[ where a single domain is being averaged. ]
[ The first form has MIR phases and a ncs ]
[ dimer. The second form is unphased and ]
[ contains a monomer. The third form is ]
[ unphased and contains a trimer. ]

dmmulti \
 hklin1 ins6a.mtz                         hklout1 dmins1.mtz \
 hklin2 ins_hagfish_tetr_T_dim.mtz        hklout2 dmins2.mtz \
 hklin3 ins_mi3_crosslinked_fred_p321.mtz hklout3 dmins3.mtz \
 mskin1 insab.msk \
 << +
NCYC 500

XTAL 1
RESO 1000 2.0
SCHEME RES FROM 6.0
MODE SOLV HIST AVER
SOLC 0.30
AVER
ROTATION MATRIX: 1 0 0 0 1 0 0 0 1
TRANSLATION 0 0 0
AVER
ROTATION MATRIX    -0.87108 -0.49050  0.02492 -
                   -0.49025  0.87144  0.01588 -
                   -0.02951  0.00162 -0.99956
TRANSLATION   -0.18740   0.11924  -0.66475
LABI FP=FP SIGFP=SDFP  PHIO=AISOB FOMO=FOM

XTAL 2
RESO 1000 2.0
SCHEME RES FROM 6.0
MODE SOLV HIST AVER
SOLC 0.50
AVER
ROTATION MATRIX     0.46802  0.82899  0.30616 -
                   -0.81508  0.53880 -0.21293 -
                   -0.34148 -0.14989  0.92786
TRANSLATION  3.90866   3.11148   1.14348
LABI FP=FP SIGFP=SIGFP

XTAL 3
RESO 1000 2.0
SCHEME RES FROM 6.0
MODE SOLV HIST AVER
SOLC 0.40
AVER
ROTATION MATRIX     0.71822 -0.69491 -0.03563 -
                   -0.69556 -0.71840 -0.00954 -
                   -0.01897  0.03164 -0.99932
TRANSLATION  0.24079  45.93060   9.55959
AVER
ROTATION MATRIX    -0.26303 -0.96446  0.02510 -
                    0.96442 -0.26356 -0.02065 -
                    0.02653  0.01877  0.99947
TRANSLATION  0.60388  45.35286  10.53205
AVER
ROTATION MATRIX      0.68837  0.72534  0.00538 -
                   -0.72535  0.68836  0.00420 -
                   -0.00066 -0.00680  0.99998
TRANSLATION -0.45315   0.17668   0.38123
LABI FP=FMI3 SIGFP=SMI3
+