MTZ2VARIOUS (CCP4: Supported Program)

NAME

mtz2various - produces an ASCII reflection file for MULTAN, SHELX, TNT, X-PLOR/CNS, MAIN, mmCIF, pseudo-SCALEPACK, XtalView (foo.phs) or user-defined format. This may contain amplitudes, intensities or differences.

SYNOPSIS

mtz2various hklin foo_in.mtz hklout foo_out
[Keyworded input]

DESCRIPTION

This reads an MTZ file (assigned to HKLIN) and produces an ASCII file (assigned to HKLOUT)in a suitable form for MULTAN, SHELX, TNT, X-PLOR/CNS, pseudo-SCALEPACK, MAIN, XtalView (foo.phs) or in a user-defined format. For SHELX it is possible to output all quantities as intensities, i.e. F or delF terms may be squared. An mmCIF file can also be produced with all the relevant information taken from the MTZ header.

There are many options controlled by the assignments on the LABIN line. The most common requirements are:

There is no guarantee that the reflection count is completely robust. Files sometimes have been slightly corrupted; e.g. DP not present but F(+) and F(-) there.

When using OUTPUT USER you can define the output columns as you wish; this option can be used to construct a foo.phs file by assigning F PHI and FOM (see examples).

Many of the tasks can also be performed with SFTOOLS.

KEYWORDED INPUT

The allowed keywords are:

END, EXCLUDE, FREEVAL, FSQUARED, INCLUDE, LABIN, MISS, MONITOR, OUTPUT, RESOLUTION, SCALE

Compulsory input keywords are OUTPUT and LABIN.

OUTPUT [ MULTAN | SHELX | SHELXDiff | TNT | CIF | XPLOR | CNS | MAIN | SCAL | USER ]

The output types are as follows:

MULTAN

The output file has h,k,l,f,imt in FORMAT(3I4,7X,F7.0,I6), where imt=0 for a good reflection.

SHELX (for complete structure solution, for structure refinement, or for finding heavy atoms from isomorphous difference) or
SHELXDiff (for finding anomalous scatterers from anomalous differences)

To use the SHELX suite of programs (SHELXD, SHELXE, SHELXL or SHELXS) it is necessary to prepare two input files: foo.ins containing information about the cell, symmetry and some parameters to control the SHELX run, and foo.hkl containing a reflection list. The foo.hkl file may contain intensities (HKLF 4) or amplitudes (HKLF 3). Intensities may be generated from input amplitudes using the FSQUARED keyword, but it is better to use the original intensities. The foo.ins file finishes with a record HKLF 3 if foo.hkl contains amplitudes, or HKLF 4 for intensities.

The foo.hkl file created by this program contains:

if F terms are to be output (HKLF 3 format), lines of the form
h, k, l, "F", "sigmaF", "freeRflag" in FORMAT(3I4,2F8.2,I4).
if I terms are to be output (HKLF 4 format), lines of the form
h, k, l, "I", "sigmaI", "freeRflag" in FORMAT(3I4,2F8.2,I4).

To use the programs SHELXD to solve a complete molecule by direct methods, to use SHELXL for refinement, or to prepare a reflection list for SHELXE, assign either I/SIGI/FREE or FP/SIGFP/FREE only on the LABIN line. Reflections previously flagged for FreeR analysis are marked with -1 in the last column. These can be extracted by "grep -e -1$ foo.hkl"

To use the program SHELXD to find heavy atom or anomalous scattering sites, followed by SHELXE to calculate protein phases, you need to prepare two *.hkl files, one containing the FP to be phased, and the other the differences between two observations which are related to the substructure signal. To use isomorphous differences, scale FP and FPH together, and assign FP and FPH on the labin line. Request OUTPUT SHELX. MTZ2VARIOUS outputs to foo.hkl the difference |FP - FPH|, or its squared value (i.e. |FP - FPH|^2) if FSQUARED specified, and an appropriate SIGMA, followed by a phase estimate. The output file will contain lines of the form (HKLF 3 format if FSQUARED not specified):

h, k, l, Del= ABS("FPH-Fp"), sigma"DEL", PHIdel  in FORMAT(3I4,2F8.1,I4)
where PHIdel is 0 or 180, depending on whether Del is positive or negative. Similarly for HKLF 4 format, if FSQUARED is specified.

If you wish to use anomalous differences, you can EITHER assign FP as FPH(-) and FPH as FPH(+), OR assign DP as DPH in which case the program will output DPH or its square. You must use keyword SHELXDiff to use this option; this flags that phases must be 90 or 270, not 0 or 180. The output file contains the anomalous differences and has lines of the form (HKLF 3 format if FSQUARED not specified):

    
h, k, l, Del=ABS("FPH(+)-FPH(-)"), sigma"DEL", PHIdel  in FORMAT(3I4,2F8.1,I4)
where PHIdel is 90 or 270, depending on whether Del is positive or negative. Similarly for HKLF 4 format, if FSQUARED is specified.

The phase information is needed for SHELXE. If the program SHELXD is to be used to find heavy atom or anomalous scattering sites from substructure differences, and you wish to run the program SHELXE to calculate protein phases using the SHELXD file, it must also list a phase estimate for the difference.

TNT

The output file has 'HKL ', h, k, l, F, sig(F), phase, fom in format(A4,3I4,3F8.1,F8.4), with phase = 1000, fom = 0 i.e. dummies. Note that files for TNT must be sorted on h, k, l and certain reflection zones are required. You may need to run CAD to resort your data. Use keywords INCLUDE FREER <num> and EXCLUDE FREER <num> to generate files for R-free calculation.
There is a maximum likelihood version of TNT from Pannu and Read which requires a free-R flag (in XPLOR convention). This column will be output if you assign the FREE column in LABIN and do not use the INCLUDE | EXCLUDE FREER options.

CIF <data block header>

CIF output is invoked, where <data block header> is a maximum of 80 characters long, and must begin with the characters "data_" (any mixture of upper and lowercase thereafter). OUTPUT CIF can be used to prepare data (from crystallography or EM) for deposition to the PDB.
Unlike the other output formats, all the reflections from HKLIN are written to HKLOUT. Not all column labels are appropriate for CIF output (see Notes on CIF). Also, only RESO, EXCLUDE SIGP and FREEVAL can be used with OUTPUT CIF. They are used to flag certain reflections but not to reject them. The others are ignored.

XPLOR

The output file has FORMAT(A,3I5,A,F10.1,F10.1,A,F10.2,A,I6...). The exact contents will depend on which labels have been specified by the LABIN keyword. See the documentation for FREERFLAG for a table explaining the differences in free R flag conventions.

CNS

Similar to XPLOR output. However, free R flags are left unchanged. To select the correct free R flag in CNS, you will need something like:

{===>} test_flag=0;
Outputting Anomalous pairs

For SHELX and XPLOR/CNS ONLY. If FP and the anomalous difference is assigned (see LABIN), then the amplitudes for reflections h,k,l and -h,-k,-l are generated and output as separate reflections. In this case, the column ISYM may also be assigned if it is present: this is a flag from TRUNCATE which
= 0 if F comes from both positive (hkl) and negative (-h-k-l) Bijvoet reflections,
= 1 if only from F+ and
= 2 if only F-

MAIN

This gives output suitable for the MAIN program. The output file contains H K L FP SIGFP and optionally FREE, PHIB and FOM if they are specified on the LABIN line. Alternatively, if FC is specified on the LABIN line, then FP and FC are interpreted as the real and imaginary parts respectively of a calculated F, and output as a "COMPLEX" field.

SCAL

This gives pseudo-SCALEPACK output which is needed as input to the SOLVE package. The output file assigned to HKLOUT is ASCII and writes out H K L I(+) SIGI(+) I(-) SIGI(-), with the format (3I4,4F8.1). The output may need to be rescaled to fit this format. If the input is F(+) and F(-) the rescaling is done within the program

USER <format>

The output file is of the form H K L ? ? ... where the user can specify which columns are to be output, how many and in what format. It can be used to generate a foo.phs file suitable for XtalView. See examples. Ten dummy labels (DUM??) are available to assign to any column and are output as real. Also, there are ten dummy columns (IDUM??) which are output as integer. The order of the data in the ASCII file are taken from the order of the program labels specified on the LABIN card e.g. LABIN FP=FP1 DP=DP1 SIGFP=SIG1 SIGDP=SIGDP1 would give the order H K L FP1 DP1 SIG1 SIGDP1 in the output file. The format must either be of a FORTRAN type with initially three integer items and the rest must be complementary with the LABIN card e.g.

  LABIN FP=FP DUM1=X IDUM1=Y
  OUTPUT USER '(3I4,2F7.1,I4)'

or

  OUTPUT USER *

to use free formatted output. However, all columns after H, K and L will be treated as real numbers.

LABIN <program label>=<file label>

The output is controlled by the labels specified here:

Beware: if you want to take any sort of difference: Fph - Fp, or F(+) - F(-) you MUST specify FP= ..., FPH=...:

Input labels accepted are:

H, K, LIndices
FP, SIGFPF and Sigma for native
FPH, SIGFPHF and Sigma for derivative
FC, PHICF and Phase from model
FPART, PHIPARTF and Phase from partial structure
DP, SIGDPAnomalous difference and Sigma
I, SIGII and Sigma
F(+), SIGF(+)F+ and Sigma(F+)
F(-), SIGF(-)F- and Sigma(F-) used for anomalous output
I(+), SIGI(+)I+ and Sigma(I+)
I(-), SIGI(-)I- and Sigma(I-)
FPART_BULK_S, PHIPART_BULK_SPartial F and Phase for bulk solvent correction
W, FOMWeights
PHIBBest phase (experimental)
HLA,HLB,HLC,HLDHendrickson-Lattman coefficients
FREEFreeR flag
ISYM(see TRUNCATE)
DUM??Dummy labels (output as real)
IDUM??Dummy labels (output as integer)

Not all columns are used in the various output formats, see Notes on INPUT and OUTPUT. Also, the contents of the columns which are output may depend on which input columns are assigned by LABIN, see DESCRIPTION above.

Note: when using the DUM?? and IDUM?? labels, the program may generate warnings about column type mismatches. This may happen for instance if an anomalous difference (column type D) is assigned to one of the DUM labels (which is nominally of type R, i.e. 'any other real'). These warnings should be ignored, and the output is not affected.

END

End input.

FSQUARED

If this flag is set, the program expects F and SIGF and will output I and SIGI: I = F*F, SIGI = 2*SIGF*F + SIGF*SIGF. These intensities are not necessarily the same as the measured intensities (pre-TRUNCATE); it is better to use the measured values if you have them.

MONITOR <Nmon>

followed by an integer <Nmon>. Every <Nmon>-th reflection within the resolution range is monitored (printed out).

RESOLUTION <resmin> <resmax>

followed by 2 real numbers, <resmin>, <resmax>. This can be used to restrict the output data to the given resolution range.

SCALE <scale>

The F/SIGF or I/SIGI are multiplied by <scale> before output. For SHELX output, if the SCALE keyword is not given then a scale factor is computed so that the maximum intensity is 99999.0 (so as to fit into the output format).

INCLUDE <keyword> <value> ...

Each secondary keyword is followed by a number setting the appropriate limit for excluding data. Possible keywords are FREER.

FREER <num>
Include only reflections with FreeRflag = <num>. This is different from the FREEVAL keyword which specifies the freeR set. This will only be applicable if you have assigned the FREE column.

EXCLUDE <keyword> <value> ...

Each secondary keyword is followed by a number setting the appropriate limit for excluding data. Possible keywords are SIGP, SIGH, DIFF, FPMAX, FPHMAX, FREER. If DP is assigned without FP then the exclusion criterion for DIFF are applied to |DP|.

SIGP <Nsig1>, SIGH <Nsig2>
Reflections are excluded if: FP<(<Nsig1>*SIGFP), FPH(<Nsig2>*SIGFPH). Formerly MULTAN reflections were flagged and others unaffected but now not output to any format.
DIFF <difference_limit>
Reflections are excluded if |FP-FPH| (or |DP|) > <difference_limit>
FPMAX <maximum>
Give <maximum> value for FP.
FPHMAX <maximum>
Give <maximum> value for FPH
FREER <num>
Omit reflections with FreeRflag = <num>. This is different from the FREEVAL keyword which specifies the freeR set. This will only be applicable if you have assigned the FREE column.

FREEVAL <num>

The reflections with FreeRflag = <num> are treated as the freeR set: the default is 0 if FREE is assigned. This is important if you want to include a free-R test in your XPLOR/CNS or SHELX refinement, or you are using the Pannu-Read version of TNT. The FREE column must be assigned with LABIN.

MISS <valm>

By default, if any data associated with a reflection are missing, i.e. are represented in HKLIN by a Missing Number Flag (MNF), then that reflection will not appear in the output. However, if the keyword MISS is given then these reflections will be output, but with the MNFs converted to <valm>. The latter need not be given, and defaults to 0.0. The other exclusions are still effective. Note that mmCIF output is a special case, and the mmCIF character '?' is used to denote missing values. This keyword is therefore ignored for mmCIF output.

Also, if MISS is present then when producing isomorphous data, i.e. |FPH-FP|, if either FPH or FP is a MNF then the difference is set to zero and the sigma is twice the measured sigma. For example; FP=MNF SIGFP=MNF, FPH=100 SIGFPH=10 then FPH-FP = 0 and SIG=20.

Notes on INPUT and OUTPUT

Not all INPUT columns are accepted with a particular OUTPUT format. If one has OUTPUT <subkw> then the allowed input columns are given below (see LABIN and OUTPUT):

subkw = USER
accepts all input columns. Remember the format must match up with the column assignments i.e. assignments to IDUM must be output as integers, all others are treated as real. Warnings about mismatched column types when using DUM or IDUM labels can be ignored; see LABIN keyword.
subkw = XPLOR [or CNS]
accepts all input columns except DUM1 to DUM10 and IDUM1 to IDUM10 and I+, SIGI+, I- and SIGI-.
subkw = SHELX or SHELXD
accepts columns H to SIGFPH, DP/SIGDP (with or without FP), I/SIGI and FREE
subkw = MULTAN
is like SHELX but will only use FREE to include or exclude reflections.
subkw = TNT
is like SHELX except for the use of FREE: if the INCLUDE FREER or EXCLUDE FREER keywords are specified then FREE is used to include or exclude reflections, otherwise the FREE column (if assigned) is output.
subkw = MAIN
accepts H, K, L, FP, SIGFP, FREE, PHIB, FOM, FC
subkw = CIF
accepts H, K, L, FP, SIGFP, I, SIGI, DP, SIGDP, FC, PHIC, PHIB, FOM, I(+), SIGI(+), I(-), SIGI(-), F(+), SIGF(+), F(-), SIGF(-), FPART_BULK_S, PHIPART_BULK_S, FREE, HLA, HLB, HLC, HLD

You may still have trouble getting exactly the output you want. You can use the UNIX utilities cut(1) or sed(1) to manipulate the mtz2various output.

Notes on CIF

All reflections in the MTZ input file will be output to the CIF file. However, there are ways to flag certain reflections with the data type _refln.status. Observed reflections will be flagged with 'o'. Unobserved reflections, i.e. those flagged as missing in the relevant amplitude or intensity column, will be flagged as 'x'; these reflections will not be added to _reflns.number_obs. The 'free' reflections will be flagged as 'f'. The keyword FREEVAL can be used to indicate this set. Systematically absent reflections are flagged with '-'.

If the RESO keyword is specified then reflections at higher or lower resolution than the limits given, will be written with _refln.status 'h' or 'l' respectively. The limits will be written to the CIF as the values of _refine.ls_d_res_high and _refine.ls_d_res_low.

If EXCLUDE SIG is given then reflections for which F < <value>*sigma(F), and which satisfy the resolution limits (if given), will be written with _refln.status '<'. The value of _reflns.number_obs excludes all reflections which do not satisfy the condition on sigma(F). All other sub-keywords of EXCLUDE are ignored for CIF output.
NB: The translation of the RESOLUTION and EXCLUDE SIGP conditions to _refln.status values does not imply that the the use of these conditions is good crystallographic practice. Be prepared to justify why you have excluded any data from your final refinement!

Below is a list of the items output to the CIF file:

 _entry.id

 _audit.revision_id
 _audit.creation_date
 _audit.creation_method
 _audit.update_record

 _cell.entry_id
 _cell.length_a
 _cell.length_b
 _cell.length_c
 _cell.angle_alpha
 _cell.angle_beta
 _cell.angle_gamma

 _symmetry.entry_id
 _symmetry.Int_Tables_number
 _symmetry.space_group_name_H-M
 _symmetry_equiv.id
 _symmetry_equiv.pos_as_xyz

 _reflns.entry_id
 _reflns.d_resolution_high
 _reflns.d_resolution_low
 _reflns.limit_h_max
 _reflns.limit_h_min
 _reflns.limit_k_max
 _reflns.limit_k_min
 _reflns.limit_l_max
 _reflns.limit_l_min
 _reflns.number_all
 _reflns.number_obs

 _diffrn_radiation_wavelength.id
 _exptl_crystal.id
 _reflns_scale.group_code

These items are the ones per reflection.

 _refln.wavelength_id     Always written
 _refln.crystal_id        Always written
 _refln.scale_group_code  Always written
 _refln.index_h           Always written
 _refln.index_k           Always written
 _refln.index_l           Always written
 _refln.status            Always written
 _refln.F_meas_au         FP
 _refln.F_meas_sigma_au   SIGFP
 _refln.F_calc            FC
 _refln.phase_calc        PHIC
 _refln.phase_meas        PHIB
 _refln.fom               FOM
 _refln.intensity_meas    I
 _refln.intensity_sigma   SIGI
 _refln.ebi_F_xplor_bulk_solvent_calc        FPART_BULK_S
 _refln.ebi_phase_xplor_bulk_solvent_calc'   PHIPART_BULK_S
 _refln.pdbx_HL_A_iso                   HLA
 _refln.pdbx_HL_B_iso                   HLB
 _refln.pdbx_HL_C_iso                   HLC
 _refln.pdbx_HL_D_iso                   HLD
 _refln.pdbx_F_meas_plus                F(+)
 _refln.pdbx_F_plus_sigma               SIGF(+)
 _refln.pdbx_F_minus                    F(-)
 _refln.pdbx_F_minus_sigma              SIGF(-)
 _refln.pdbx_anom_difference            DP
 _refln.pdbx_anom_difference_sigma      SIGDP
 _refln.pdbx_I_plus                     I(+)
 _refln.pdbx_I_plus_sigma               SIGI(+)
 _refln.pdbx_I_minus                    I(-)
 _refln.pdbx_I_minus_sigma              SIGI(-) 

Important note: In the 6.0 version of MTZ2VARIOUS, the tokens associated with anomalous data (such as _refln.pdbx_F_meas_plus) and with Hendrickson-Lattman coefficients have been updated to use the PDB exchange dictionary, replacing those from the CCP4 harvest dictionary. This is a change in nomenclature only and the new tokens are accepted by the deposition sites.

mmCIF (at least at version 0.8) makes no provision for the output of derivative data in the same data block as native data. For more information about what these mmCIF categories are, check out the mmCIF dictionary.

EXAMPLES

#   Output a file suitable for input to CNS or XPLOR
#
    mtz2various HKLIN nicona HKLOUT dell.hkl << EOF
    RESOLUTION 10000 2
    OUTPUT XPLOR
    EXCLUDE SIGP 0.01   # to exclude unmeasured refl.
    LABIN FP=F SIGFP=SIGF FREE=FreeR_flag
    END
    EOF

# Output a file suitable for shelx solution or refinement
    mtz2various HKLIN aucn_trn-unique.mtz HKLOUT aucn_I.hkl <<eof
    LABIN I=IMEAN SIGI=SIGIMEAN FREE=FreeR_flag
    OUTPUT SHELX
    END
    eof

# Output a file suitable for shelxd to find heavy atom sites
    mtz2various HKLOUT $CCP4_SCR/toxd.hkl hklin $CEXAM/toxd/toxd <<EOF
    LABIN  FP=FTOXD3 SIGFP=SIGFTOXD3  FPH=FAU20 SIGFPH=SIGFAU20 
    OUTPUT SHELX ( Program will recognise this is an isomorphous difference)
    RESOLUTION 10 3.5
    END
    EOF

# Output a file suitable for shelxd to find heavy atom sites from anomalous differences
    mtz2various HKLOUT $CCP4_SCR/toxd.hkl hklin $CEXAM/toxd/toxd <<EOF
    LABIN   DPH=DANOAU20 SIGDPH=SIGDANOAU20   or 
    LABIN FP=FAU20(-) SIGFP=SIGFAU20(-) FPH=FAU20(+) SIGFPH=SIGFAU20(+)
    OUTPUT SHELXD 
    RESOLUTION 10 3.5
    END
    EOF

# Output a foo.phs file suitable for XtalView map calculation after a REFMAC5 refinement
    mtz2various HKLOUT $CCP4_SCR/toxd.phs hklin $CEXAM/toxd/toxd-refmac5 <<EOF
    LABIN   DUM1=FWT DUM2=SIGFP DUM3=PHWT
    OUTPUT USER 
    RESOLUTION 10 3.5
    FORMAT( (3i5,3f12.1)
    END
    EOF

A runnable unix example script is in $CEXAM/unix/runnable/

A non-runnable unix example script which demonstrates mtz2various used to output anomalous data is in $CEXAM/unix/non-runnable/

SEE ALSO

mtzdump, f2mtz, SFTOOLS, cut(1), sed(1)

AUTHOR

Eleanor Dodson, York University