cif2mtz hklin foo.cif hklout foo.mtz
CIF2MTZ is a program to convert an mmCIF reflection file to MTZ format. mmCIF reflection files are typically obtained from the Protein Data Bank. There are examples below for some representative PDB entries.
In practice, mmCIF reflection files from the PDB can have a wide variety of data item names and contents. The program will attempt to identify quantities correctly, but you should always check the resulting MTZ file. Keywords are provided to supply missing information or to help the program make choices.
There are a large number of mmCIF reflection files in the PDB which contain the PDB coordinate file header as a comment block. In particular, cell and symmetry information is held as the CRYST1 line, rather than using the correct mmCIF categories. The program will therefore look for a CRYST1 line, as well as the mmCIF categories. If neither are present, then cell and symmetry information must be provided via keywords.
Note: CIF2MTZ works with the macromolecular CIF format "mmCIF" which is substantially different from the original "CIF" format. The latter is usually used within small molecule crystallography, but you may come across it with SHELX. Small molecule "CIF" format has a different syntax (i.e. it is based on DDL1 rather than DDL2) and so cannot easily be read from CIF2MTZ. If you don't know what format a file is in, look for _cell.length_a (with period: mmCIF) or _cell_length_a (with underscore: CIF).
Possible keywords are:
TITLE, LABOUT, CELL, SYMMETRY, NAME, BLOCK, ANOMALOUS, STATUS, END
All keywords are optional. The program will read in data, such as symmetry, from the mmCIF file if it is there. The keywords can be used to provide missing information, or to override existing information.
Put a suitable title in the output MTZ file.
Followed by the cell lengths and angles.
Followed by the standard space group name or number, or explicit symmetry operators.
The program currently recognises the following mmCIF item names for reflection data:
_refln.index_h H h index _refln.index_k K k index _refln.index_l L l index _refln.status FREE free R flag _refln.F_meas_au _refln.F_meas FP structure factor _refln.F_meas_sigma_au _refln.F_meas_sigma SIGFP sigma(F) _refln.F_calc_au _refln.F_calc FC calculated SF _refln.phase_calc PHIC calculated phase _refln.phase_meas PHIB experimental phase _refln.fom _refln.weight FOM figure of merit _refln.intensity_meas _refln.F_squared_meas I intensity _refln.intensity_sigma _refln.F_squared_sigma SIGI sigma(I) _refln.F_part_au FPART partial structure factor _refln.phase_part PHIP partial phase _refln.pdbx_F_plus F(+) _refln.pdbx_F_plus_sigma SIGF(+) _refln.pdbx_F_minus F(-) _refln.pdbx_F_minus_sigma SIGF(-) _refln.pdbx_anom_difference DP _refln.pdbx_anom_difference_sigma SIGDP _refln.pdbx_I_plus I(+) _refln.pdbx_I_plus_sigma SIGI(+) _refln.pdbx_I_minus I(-) _refln.pdbx_I_minus_sigma SIGI(-) _refln.pdbx_HL_A_iso HLA HL coefficient A _refln.pdbx_HL_B_iso HLB HL coefficient B _refln.pdbx_HL_C_iso HLC HL coefficient C _refln.pdbx_HL_D_iso HLD HL coefficient D
An MTZ column is output for each mmCIF item found. The default column name <program label> is given in the middle column above, but the LABOUT keyword can be used to rename these columns.
With the ANOMALOUS option, there are additional columns F(+) SIGF(+) F(-) SIGF(-) which can be renamed by LABOUT.
Note: the mmCIF file may contain alternative labels, e.g. _refln.F_meas rather than _refln.F_meas_au. Some alternative labels will be recognised, see table above. Otherwise it is sufficient to edit the label name directly in the mmCIF file to one of the above labels.
Specify the project, crystal and dataset names for the output MTZ file. <pname> and <xname> are taken from _entry.id if present in the mmCIF file, and <dname> is taken from _diffrn.id if present in the mmCIF file. If the mmCIF file does not contain _entry.id and _diffrn.id, then it is strongly recommended that this information is given. Otherwise, the default project, crystal and dataset names are "unknown", "unknown" and "unknownddmmyy" respectively.
The project-name specifies a particular structure solution project, the crystal name specifies a physical crystal contributing to that project, and the dataset-name specifies a particular dataset obtained from that crystal. All three should be given.
mmCIF-format reflection files contain reflection data in blocks started by a "data_<blockname>" tag. Generally, a file will only contain a single block. In some cases, however, there may be several related blocks (e.g. native data, and additional datasets for phase determination). A specific block may be converted to MTZ format by specifying its name (including the "data_" prefix). Without this keyword, cif2mtz will simply convert the first data block.
If this keyword is given, the program will attempt to recover F(+) and F(-) columns from hkl / -h-k-l pairs in the input mmCIF file. Any hkl singletons will be treated as F(+) data. Columns FP/SIGFP will also be output as mean of the input F(+) and F(-).
Warning: this option will work with mmCIF such as are output by MTZ2VARIOUS, where the -h-k-l reflection immediately follows the corresponding hkl, and where hkl reflections are in the CCP4 asymmetric unit. It will fail in other cases. Without the ANOMALOUS option, all reflections will be passed unchanged to HKLOUT.
Note also: this option deals with the case where anomalous pairs exist as different reflection rows. This is different (and incompatible with) the case where anomalous pairs exist as different columns, e.g. _refln.pdbx_F_plus and _refln.pdbx_F_minus
The _refln.status column (if present) should flag reflections used for the free-R calculation by 'f', and others by 'o'. Some mmCIF files use '0' and '1' instead. The program will pick this up automatically, but needs to know which convention is being used. This keyword can be used to set the XPLOr convention ('1' if free, '0' otherwise) or the CCP4 convention ('0' if free, '1' otherwise). The default is XPLOr convention.
This is the convention used in the input mmCIF file. The MTZ file always adheres to the CCP4 convention.
cif2mtz hklin r1gmesf.ent hklout 1gme.mtz <<eof END eofAnother example of diffraction data, this time containing squared structure factors (assumed to be intensities) and calculated structure factors. The file only contains the list of reflections, so additional information must be supplied:
cif2mtz hklin r1d9ysf.ent hklout 1d9y.mtz <<eof CELL 40.583 111.009 140.423 90.00 90.00 90.00 SYMM C2221 NAME PROJ MBP CRYS apoprotein DATA native END eofFinally, 1gr5 contains structure factors and phases from electron microscopy data:
cif2mtz hklin r1gr5sf.ent hklout 1gr5.mtz <<eof END eofNote that this file contains dummy cell dimensions and symmetry, and it may be convenient to set these with the CELL and SYMM keywords.
f2mtz - convert other ASCII formats to MTZ
mtz2various - convert MTZ to mmCIF and others