fffear HKLIN foo.mtz [XYZIN foo.pdb] [MAPIN foo.map] [MAXIN foo.max] XYZOUT bar.pdb [ SOLIN foo.msk ]
`fffear' is a package which searches for molecular fragments in poor quality electron density maps. It was inspired by the Uppsala `ESSENS' software (Kleywegt+Jones, 1997), but achieves greater speed and sensitivity through the use of Fast Fourier transforms, maximum likelihood, and a mixed bag of mathematical and computational approaches (Cowtan, 1998). Currently, the main application is the detection of helices in poor electron density maps (5.0A or better), and the detection of beta strands in intermediate electron density maps (4.0A or better). It is also possible to use electron density as a search model, allowing the location of NCS elements. Approximate matches may be refined, and translation searches may be performed using a single orientation. The results are scored using an agreement function based on the mean squared difference between model and map over a masked region.
The program takes as input an mtz file containing the Fourier coefficients of the map to be searched, and a search model in the form of a pdb file, map, or maximum likelihood target. A `fragment mask' is generated to cover the fragment density, and orientations and translations are searched to find those transformations which give a good fit between the fragment density and map density within the fragment mask.
The program has been highly optimised using reciprocal-space rotations and grid-doubling FFT's, and crystallographic symmetry (Rossman+Arnold, 1993) giving 4-50 times speed improvement over the results published in 1998. The speed of the calculation is almost independent of the size of the model, thus the program may also be used for molecular replacement calculations where weak phases are available.
A maximum likelihood search function is under consideration for future versions.
fffear hklin ~/hkl/gmto-unique.mtz xyzin alpha-helix-10.pdb xyzout alpha10-rot.pdb << eof SOLC 0.35 SEARCH STEP 10 RESO 1000.0 3.5 CENTRE ORTH 7.464 16.169 16.893 LABI FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM END eofAt 5.0A some of the fragment density is no longer localised. This can cause a mismatch between the fragment and protein density. One solution is to use the 'FILTER MAP' keyword to match the map and fragment densities. A better option is to use the 5.0A maximum likelihood search target. The search target is provided on MAXIN, a model is also provided for visualisation purposes only:
fffear hklin ~/hkl/gmto-unique.mtz maxin ml-helix-9-5.0A.max xyzin ml-helix-9.pdb xyzout alpha10-rot.pdb << eof SOLC 0.35 SEARCH STEP 15 RESO 1000.0 5.0 FILTER MAP RADIUS 6.0 CENTRE ORTH 7.464 16.169 16.893 LABI FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM END eofWhen the search model is large (i.e. molecular replacement calculations or density fragment searches to find NCS of cross-crystal operators), the search can be foiled by long range variations in either the map or fragment density. In this case filtering should be applied to both the map and the search model:
fffear hklin ~/hkl/rnase-unique.mtz mapin rnase-mol.map << eof SOLC 0.35 SEARCH STEP 15 MASK RADI 2.5 RESO 1000.0 5.0 FILTER MAP MODEL RADIUS 6.0 CENTRE ORTH 7.464 16.169 16.893 LABI FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM END eofIn the case of molecular replacement calculation and NCS searches it is important that the search model and map should be scaled correctly:
Input mtz file - This should contain the conventional (CCP4) asymmetric unit of data (see CAD).
The mtz file should contain all reflections to the limit of the measured diffraction pattern, since all the reflections are used to accurately scale the data. However, only those reflection with phases, to the resolution limit specified by the compulsory RESOLUTION keyword, will actually be used in the search procedure.
Input pdb file. This may contain an arbitrary crystal header, or none at all. The only restriction is that the atomic coordinates are given in Angstroms on arbitrary orthogonal axes. The B-factors of the input atoms should be set to an average value of (around) zero. Normally, all B-factors should be equal, unless some prior information about the B-factors of atoms in the desired fragment is available. (It is legitimate for example to make the B-factors of the C-beta or Oxygen atoms higher).
Input map file. This may be specified as an alternative to XYZIN to
perform a search for NCS or cross-crystal operators. The map should
contain the search density, placed in a cubic cell. Regions of the map
outside the search mask should be set to zero. The search map will
usually be generated using maprot in map
cutting mode, e.g.
If the input map is calculated for the same structure factors which
are given to fffear, the scaling can be overridden using
SCALE 1.0 0.0.
maprot wrkin rnase-mir.map mskin rnase-mol.msk cutout rnase-mol.map << eof
CELL UNIT 100 100 100 90 90 90
GRID UNIT 150 150 150
ROTA POLAR 0 0 0
TRAN 0 0 0
An input coordinate file may also be provided on XYZIN. This will not be used for the search, but will be rotated and output for visualisation purposes.
Input maximum likelihood search target. This is used in the same way as an input map, however it also contains density variance information. (Special software is used for the construction of ML targets for fffear). ML targets are resolution dependent, so the appropriate target should be used in conjunction with the RESOLUTION keyword. XYZIN may again be used for visualisation purposes.
Output pdb file. This contains multiple copies of the input fragment, rotated and translated to the positions of the best matches between the fragment density and and map density. The fragments are sorted in order of quality, with the best first. The b-factor is set to the value of the search function, with low values representing a good fit.
Good matches to major secondary structure features are usually obvious because several fragments link up or overlap in sensible manners. At better than 4.0A resolution, the direction of the chain is commonly correct as well.
The output pdb file may be further analysed with `ffjoin'.
A map of the best fragment fit at each position in the map. Values closest to zero represent the best fit.
Input mask - this is used as a filter for the results. Any rotation/translation solutions whose centre-of-mass falls in the solvent (zero) region of the mask will be excluded from the output. If no mask is given, the whole cell is allowed.
Generally there is no point providing a solvent mask, since the solvent density generally does not provide a match to atomic features. However this may be useful when fitting a molecular replacement map from a very incomplete model, to exclude hits to the MR model.
Input is keyworded. Available keywords are: SOLC, LABIN, RESOLUTION, MODEL, MASK, FILTER, SEARCH, SCALE, CENTRE, GRID, FORM, TRUNCATE, STRUCFAC.
(SOLC and LABIN are compulsory. RESOLUTION is strongly recommended.)
Enough columns must be provided to allow calculation of a map. Common combinations include (calculated_magnitude + phase), (observed_magnitude + phase + weight), (weighted_magnitude + phase).
Resolution range of reflections to include in the translation
search stage of the calculation. This should be set to cover the
resolution range for which significant phase information is
available. Good results are obtained with phases to 4.0A or better;
for larger fragments (10 residues or more) information may be obtained
at still lower resolutions.
Set the parameters for the model atoms.
Set the radius of the fragment mask about the model atoms. This determines the volume over which the agreement between the map and the model are compared. Defaults: <mskrad>=2.5A.
Apply a filter to the map and/or model before starting the search. The filter may match either the local mean (default) within the filter radius, or it may match both the local mean and variance. This is useful at low resolutions or when performing MR or NCS searches.
Set the parameters for the search function.
Override internal scaling and scale input data by:
Center the output fragment positions in an asymmetric unit around <x> <y> <z>, given in fractional or orthogonal coordinates in accordance with the preceding keyword. Useful to put your matches in the same region and any model you are working on.
Set the grid for the calculation. Ideally the grid spacing should
be 1/5 of the resolution of the phases, thus for 4.0A phases the grid
spacing must be 0.8A. Spacings greater than 1/4 of the resolution will
cause an error. Grid sampling must be a multiple of 4 and obey any
other requirements imposed by the spacegroup.
Alternate 2-gaussian formfactor coefficient for atomic number <z>. f=<a1>exp(<b1>s)+<a2>exp(<b2>s). Formfactors are supplied for H, N, C, O, S and other atom types are scaled from these. Given that the model B-factors will generally be wrong, a crude approximation is sufficient for all common cases.
Resolution range of reflections to include in the data scaling
stage. This keyword can be used to exclude part of the input data by
resolution cutoffs. This is generally highly inadvisable.
Use a (slow) direct Fourier to calculate the model and mask structure factors instead of the default FFT. The REAL and RECIP keywords may then be used to set the spacing of the real space grid used to calculate the fragment density and mask, and the reciprocal space sampling of the fragment and mask transforms.
The output PDB file (XYZOUT) contains up to 1000 copies of the
input molecule in decreasing order of fit to the density. For the
purposes of visualisation I find it useful to get the header and the
first 250 C-alpha atoms from this file, as follows:
grep 'C[AR]' XYZOUT | head -250 > ca.pdb
The translation function map (which omits the orientation information) is also output on MAPOUT. This has peaks where the origins of the good orientations are found. If the input model has an alpha carbon at the origin a rough backbone trace of map regions matching the fragment may be obtained.
Kevin D. Cowtan, Department of Chemistry, University of York
fffear fragment library, ffjoin, maprot, xloggraph