MTZ2CIF (CCP4: Supported Program)

NAME

mtz2cif - produce an mmCIF reflection file suitable for deposition. This may contain amplitudes, intensities and/or differences.

SYNOPSIS

mtz2cif hklin foo_in.mtz hklout foo_out.cif
[Keyworded input]

DESCRIPTION

MTZ2CIF reads an MTZ file (assigned to HKLIN) and produces an mmCIF file (assigned to HKLOUT) in a form suitable for deposition with the PDB. The user must specify which quantities are to be exported via the LABIN keyword; cell and symmetry information is taken directly from the MTZ file.

It is also possible to export multiple MTZ datasets to a single mmCIF file by specifying multiple LABIN lines.

KEYWORDED INPUT

The allowed keywords are:

DATABLOCK, END, EXCLUDE, FREEVAL, LABIN, MODE, RESOLUTION

Compulsory input keywords are DATABLOCK and LABIN.

DATABLOCK <data block header>

(Compulsory)

<data block header> is a maximum of 80 characters long, and must begin with the characters "data_" (any mixture of upper and lowercase thereafter).

END

End input.

EXCLUDE <keyword> <value> ...

Only one keyword is allowed for EXCLUDE:

SIGP <value>
Reflections are excluded if F < <value>*sigma(F)

Reflections for which F < <value>*sigma(F), and which satisfy the resolution limits (if given), will be written with _refln.status '<'. The value of _reflns.number_obs excludes all reflections that do not satisfy the condition on sigma(F).

FREEVAL <num>

The reflections with FreeRflag = <num> are treated as the freeR set: the default is 0 if FREE is assigned. The FREE column must be assigned with LABIN.

LABIN <program label>=<file label>

The output is controlled by the labels specified here:

Input labels accepted are:

H, K, LIndices
FP, SIGFPF and Sigma for native
FC, PHICF and Phase from model
DP, SIGDPAnomalous difference and Sigma
I, SIGII and Sigma
F(+), SIGF(+)F+ and Sigma(F+)
F(-), SIGF(-)F- and Sigma(F-) used for anomalous output
I(+), SIGI(+)I+ and Sigma(I+)
I(-), SIGI(-)I- and Sigma(I-)
W, FOMWeights
PHIBBest phase (experimental)
HLA,HLB,HLC,HLDHendrickson-Lattman coefficients
FREEFreeR flag

To output multiple datasets from a single MTZ file to a single CIF, use multiple LABIN lines (one per dataset). In CIF, a dataset corresponds to a unique crystal/wavelength pair. The program assumes that the crystal and dataset information is correctly set up in the MTZ file - see the MTZ documentation for more details about crystals and datasets in MTZ files.

There are restrictions on the use of multiple datasets:

  1. Each LABIN line must have the same set of program labels (above) for each dataset, with the exception of FREE (which must be specified no more than once, and can only appear on the first LABIN line).
  2. All columns selected on a single LABIN line must correspond to the same crystal and dataset in the MTZ file.

Note that multiple datasets involves writing out non-standard CIF tokens - these need to be agreed with the RCSB and EBI. If only a single dataset is written then the resulting CIF should conform to the existing standards.

MODE PDBX | CCP4

Default: PDBX

Specify the _refln.* token set used to write out the reflections in the output CIF, for anomalous data.

PDBX
Use the PDBX exchange dictionary tokens.
CCP4
Use the CCP4 exchange dictionary tokens.

The CCP4 exchange dictionary corresponds to the token set for the old MTZ2VARIOUS CIF output.

RESOLUTION <resmin> <resmax>

Specify minimum (<resmin>) and maximum (<resmax>) resolution range in Angstroms. Note that reflections outside these limits are still output but are flagged as 'l' (below low resolution limit) or 'h' (above high resolution).

The limits will be written to the CIF as the values of _reflns.d_resolution_high and _reflns.d_resolution_low.

Notes on generating mmCIF for deposition

1. Reflection Status

All reflections in the MTZ input file will be output to the CIF file. However, there are ways to flag certain reflections with the data type _refln.status. Observed reflections will be flagged with 'o'. Unobserved reflections, i.e. those flagged as missing in all the relevant amplitude and/or intensity columns, will be flagged as 'x'; these reflections will not be added to _reflns.number_obs.

The 'free' reflections will be flagged as 'f'. The keyword FREEVAL can be used to indicate this set. Systematically absent reflections are flagged with '-'. Note that 'free' reflections are counted as 'observed' when outputting the total number of observed reflections to _reflns.number_obs.

2. Use of resolution cut-offs and sigma exclusion

Note that the translation of the RESOLUTION and EXCLUDE SIGP conditions to _refln.status values does not imply that the the use of these conditions is good crystallographic practice. Be prepared to justify why you have excluded any data from your final refinement.

3. Missing values

The mmCIF character '?' is used to denote missing values.

4. Treatment of anomalous data in MTZ2CIF

The output of anomalous data from MTZ to CIF is still not completely resolved. The OUTPUT CIF option in older versions of the MTZ2VARIOUS program did not have the CIF tokens corresponding to F(+)/F(+) or anomalous difference, and so anomalous data was converted to explicit hkl/-h-k-l pairs with the corresponding F(+) or F(-) value written to _refln.F_meas_au as appropriate.

With the use of explicit tokens for anomalous data this approach is not necessary - only hkl needs to be written. However note that there is some ambiguity if only mean FP is supplied (i.e. without anomalous differences or supporting F(+) and F(-) pairs). In this case MTZ2CIF will only write one reflection to the CIF per reflection in the MTZ file.

Note also that while the CIF2MTZ program can recognise the anomalous tokens (as of CCP4 v6.0), other programs such as SFCHECK may not deal correctly with the anomalous data in the CIF

5. Multiple crystals and wavelengths

It is possible with MTZ2CIF to write multiple MTZ crystals and datasets from a single MTZ file, into a single CIF. This is done by specifying multiple LABIN lines (one for each crystal).

Each LABIN line will correspond to a unique _refln.crystal_id and _refln.wavelength_id pair in the output reflection list. Additional non-standard CIF tokens are written in the following CIF blocks in order to correctly relate the contents of the block to the crystals and wavelengths that have been output:

  1. CELL block: _cell.CCP4_wavelength_id and _cell.CCP4_crystal_id relate the cell parameters to a particular crystal_id in the _REFLN block (nb the wavelength_id is probably redundant).
  2. REFLNS block: _reflns.CCP4_wavelength_id and _reflns.CCP4_crystal_id relate the statistics to a a particular crystal_id and wavelength_id pair in the REFLN block.
  3. DIFFRN_RADIATION_WAVELENGTH block: _diffrn_radiation_wavelength.CCP4_crystal_id is needed to uniquely identify the wavelength to which this refers.

Note that at present neither CIF2MTZ nor SFCHECK can deal with multiple crystals and datasets.

6. CIF Data Items

Below is a list of the items output to the CIF file:

 _entry.id

 _audit.revision_id
 _audit.creation_date
 _audit.creation_method
 _audit.update_record

 _cell.entry_id
 _cell.CCP4_wavelength_id (only for multiple datasets)
 _cell.CCP4_crystal_id (only for multiple datasets)
 _cell.length_a
 _cell.length_b
 _cell.length_c
 _cell.angle_alpha
 _cell.angle_beta
 _cell.angle_gamma

 _symmetry.entry_id
 _symmetry.Int_Tables_number
 _symmetry.space_group_name_H-M
 _symmetry_equiv.id
 _symmetry_equiv.pos_as_xyz

 _reflns.entry_id
 _reflns.CCP4_wavelength_id (only for multiple datasets)
 _reflns.CCP4_crystal_id (only for multiple datasets)
 _reflns.d_resolution_high
 _reflns.d_resolution_low
 _reflns.limit_h_max
 _reflns.limit_h_min
 _reflns.limit_k_max
 _reflns.limit_k_min
 _reflns.limit_l_max
 _reflns.limit_l_min
 _reflns.number_all
 _reflns.number_obs

 _diffrn_radiation_wavelength.CCP4_crystal_id (only for multiple datasets)
 _diffrn_radiation_wavelength.id
 _exptl_crystal.id
 _reflns_scale.group_code

The following items are one per reflection:

 _refln.wavelength_id     Always written
 _refln.crystal_id        Always written
 _refln.scale_group_code  Always written
 _refln.index_h           Always written
 _refln.index_k           Always written
 _refln.index_l           Always written
 _refln.status            Always written
 _refln.F_meas_au         FP
 _refln.F_meas_sigma_au   SIGFP
 _refln.F_calc            FC
 _refln.phase_calc        PHIC
 _refln.phase_meas        PHIB
 _refln.fom               FOM
 _refln.intensity_meas    I
 _refln.intensity_sigma   SIGI
 _refln.ebi_F_xplor_bulk_solvent_calc        FPART_BULK_S
 _refln.ebi_phase_xplor_bulk_solvent_calc    PHIPART_BULK_S

The following items are also one per reflection, the exact token will depend on which set of tokens (specified by the MODE keyword) are being written:

PDBX                              CCP4                                  Label
-------------------------------------------------------------------------------
_refln.pdbx_HL_A_iso              _refln.ccp4_SAD_HL_A_iso              HLA
_refln.pdbx_HL_B_iso              _refln.ccp4_SAD_HL_B_iso              HLB
_refln.pdbx_HL_C_iso              _refln.ccp4_SAD_HL_C_iso              HLC
_refln.pdbx_HL_D_iso              _refln.ccp4_SAD_HL_D_iso              HLD
_refln.pdbx_F_meas_plus           _refln.ccp4_SAD_F_meas_plus_au        F(+)
_refln.pdbx_F_meas_plus_sigma     _refln.ccp4_SAD_F_meas_plus_sigma_au  SIGF(+)
_refln.pdbx_F_meas_minus          _refln.ccp4_SAD_F_meas_minus_au       F(-)
_refln.pdbx_F_meas_minus_sigma    _refln.ccp4_SAD_F_meas_minus_sigma_au SIGF(-)
_refln.pdbx_anom_difference       _refln.ccp4_SAD_phase_anom            DP
_refln.pdbx_anom_difference_sigma _refln.ccp4_SAD_phase_anom_sigma      SIGDP
_refln.pdbx_I_plus                _refln.ccp4_I_plus                    I(+)
_refln.pdbx_I_plus_sigma          _refln.ccp4_I_plus_sigma              SIGI(+)
_refln.pdbx_I_plus_sigma          _refln.ccp4_I_minus                   I(-)
_refln.pdbx_I_minus_sigma         _refln.ccp4_I_minus_sigma             SIGI(-)

KNOWN BUGS

2/5/2006 The CCP4 tokens are not recognised by CIF2MTZ; neither the CCP4 nor the PDBX tokens are recognised by SFCHECK.

EXAMPLES

Example with a single wavelength:

mtz2cif hklin $CEXAM/tutorial/data/gere_MAD_nat.mtz \
    hklout $CCP4_SCR/gere_MAD_nat.cif <<EOF
labin FP=F_nat SIGFP=SIGF_nat \
      F(+)=F_nat(+) SIGF(+)=SIGF_nat(+) \
      F(-)=F_nat(-) SIGF(-)=SIGF_nat(-) \
      FREE=FreeR_flag
datablock data_gere_TEST
mode PDBX # Default
end
EOF

Example with multiple crystals and wavelengths:

mtz2cif hklin $CEXAM/tutorial/data/gere_MAD_nat.mtz \
    hklout $CCP4_SCR/gere_MAD_nat.cif <<EOF
# Dataset 1
labin FP=F_nat SIGFP=SIGF_nat \
      F(+)=F_nat(+) SIGF(+)=SIGF_nat(+) \
      F(-)=F_nat(-) SIGF(-)=SIGF_nat(-) \
      FREE=FreeR_flag
# Dataset 2
labin FP=F_peak SIGFP=SIGF_peak \
      F(+)=F_peak(+) SIGF(+)=SIGF_peak(+) \
      F(-)=F_peak(-) SIGF(-)=SIGF_peak(-)
# Dataset 3
labin FP=F_infl SIGFP=SIGF_infl \
      F(+)=F_infl(+) SIGF(+)=SIGF_infl(+) \
      F(-)=F_infl(-) SIGF(-)=SIGF_infl(-)
datablock data_gere_TEST
mode PDBX # Default
end
EOF

A runnable unix example script is in $CEXAM/unix/runnable/

SEE ALSO

mtz2various, cif2mtz

AUTHOR

Peter Briggs, CCLRC Daresbury