FreeR Information (CCP4: General)

NAME

freerunique - Convert FreeRflags Between CCP4 and Other Formats (XPLOR/CNS/TNT/SHELX)

Contents

Creating a full unique set of reflections with the correct FreeRflags

For successful cross validation:

  1. It is important to select the same FreeR reflections for all related data sets (e.g. mutants, higher resolution data collected half-way through refinement, etc.).
  2. It is important to preserve the same FreeR set as you move from program to program.
  3. The FreeR set should itself be unbiased by prior refinement.
  4. The FreeR set should be representative of the full data set with respect to the distribution of structure factor amplitudes and the distribution of reflection resolution.

Different programs have different philosophies for dealing with FreeR reflections:

CCP4 first expands the data set to include all possible HKLs to the resolution given, marking those which are unmeasured. It then divides the data set into n partitions randomly, assigning a FreeRflag with values (0 1 2 ... (n-1)) to each set. These cross validation sets are used during density modification, and for refinement. The default FreeR set used within refinement is flagged as 0, but this can be changed by setting a KEYWORD FREE x.
XPLOR assigns the flag TEST=x. The only acceptable values are:
x=1 for the free set
x=0 for the working set
CNS assigns the flag TEST=x. The acceptable values range from x=0,1,...,n-1. The defaults are:
x=1 for the free set
x=0,2,...,n-1 for the working set
SHELX has a flag, following the format (3I4,2F8.2,I2). The values are:
-1 for the free set
1 for the working set
TNT separates the data into different files; one for the free set, and one for the working set. Old versions of SHELX also separated the data into different files.

Choosing a FreeR fraction

It is important to choose a fraction that is large enough so that the statistics are sensible (at least 500 reflections seems to be the consensus at the moment), but small enough so that as many reflections as possible are still used for the refinement. This is of course always true, whichever philosophy is chosen for the selection of the FreeR reflections!

How to Convert Files?

Starting from CCP4
Convert to other formats from CCP4
Examples
MTZ to CNS/XPLOR
MTZ to SHELX Intensities
MTZ to TNT - working set
MTZ to TNT - free set
Starting from other formats
Examples
Starting from CNS/XPLOR
Starting from SHELX Intensities
Starting from TNT or old SHELX
Starting from SHELX I and FC

Starting from CCP4

When you are ready to start the first refinement, or preferably as soon as you collect the native data:

If this is a new data set

Run uniqueify mydata.mtz.

This script generates an output file mydata-unique.mtz which contains
(H K L F SIGF ( I SIGI ) .. FreeR_flag) for all observed reflections to the resolution limit available, plus entries for any unobserved reflection, all with FreeR_flags assigned.

The percentage flagged defaults to 5%, but this can be reset using
uniqueify {-p fraction} mydata.mtz.

The default label is FreeR_flag but this can be reset using
uniqueify {-f FreeLABel} mydata.mtz.

If this is an isomorphous data set which should preserve the same FreeR_flags

A complete set of FreeR_flags (similar to that produced for a new data set, see above) can be added to any other related data set using CAD:

cad hklin1 new.mtz hklin2 olddata-unique.mtz hklout new-unique.mtz
LABI FILE 1 ALLIn 
LABI FILE 2 E1=FreeR_flag
END

If the new data is to higher resolution, you will now need to run uniqueify again to pad out the FreeR_flags:
uniqueify {-f FreeLABel} new-unique.mtz new-uniquer.mtz
(the default label for the free set is FreeR_flag, but you can use whatever you like).

The script will estimate the percentage of data you have used as a test set.

This assigns FreeR_flags to any reflections in the higher resolution shell where the previous set of FreeR_flags are missing.

Convert to Other Formats from CCP4

You can use the jiffy MTZ2VARIOUS to convert from MTZ to XPLOR/CNS TNT or SHELX formats quite simply. They all have different conventions, but MTZ2VARIOUS attempts to reproduce them (see program documentation: MTZ2VARIOUS).

XPLOR output will have TEST=0 for working set; TEST=1 for free set
CNS output will have TEST=1 for free set; TEST=0,2,...,(n-1) for working set
SHELX output will have 1 as the flag for the working set, and -1 for free set
TNT output may be split into two files

Examples

MTZ to CNS/XPLOR

#  test set flagged with TEST=1, working set with TEST=0
#
mtz2various     \
hklin pc553_19f-unique.mtz \
HKLOUT xplor.hkl \
<<eof
#  All these labels can be set and will be handled appropriately:
#
LABIN  FP=F SIGFP=SIGF [FPART PHIPART  PA PB PC PD  PHIB WEIGHT ] FREE=FreeR_flag
OUTPUT CNS/XPLOR
#
END
eof
exit

MTZ to SHELX Intensities

mtz2various     \
 hklin lmw.mtz \
HKLOUT shelxout.hkl \
<<eof
OUTPUT SHELX
LABIN  FP=FRBP SIGFP=SIGFRBP [IP SIGIP FP(+) FP(-) IP(+) IP(-) ] FREE=FreeR_flag
#  This will always output Is; and will rescale the data to fit the format.
#  You can override the default by setting SCAL yourself.
SCALE 0.01
#
END
eof

MTZ to TNT - working set

# TNT uses a different asymmetric unit of reciprocal space to CCP4. Dale has
# programs to convert the data if necessary.
# The data is seperated into a free set and a working set.
#
mtz2various     \
 hklin lisa.wright/lmw.mtz \
HKLOUT lisa.wright/tnt_work.hkl \
<<eof
LABIN  FP=FP SIGFP=SIGFP FREE=FreeR_flag
OUTPUT TNT
EXCLUDE FREER  0
#
END
eof
#

MTZ to TNT - free set

mtz2various     \
 hklin lisa.wright/lmw.mtz \
HKLOUT lisa.wright/tnt_free.hkl \
<<eof
LABIN  FP=FP SIGFP=SIGFP FREE=FreeR_flag
OUTPUT TNT
INCLUDE FREER  0
#
END
eof
exit

Convert to CCP4 from Other Formats

These are all ASCII formats, so F2MTZ can be used in a straightforward way. After all these conversions you need to uniqueify the MTZ file.

Run uniqueify {-f FreeLABel} mydata.mtz
This will

- fill out the missing data slots
- sort out the variety of FreeR_flags
- resort the data into CCP4 standard order

The script guesses what style of file is being imported, by looking at the distribution of FreeR_flags:

XPLOR or TNT
a few 1s, many 0s
CNS
either (0,1,..,(n-1)) or a few 1s, many 0s
SHELX
a few -1s, many 1s

It estimates the percentage of reflections flagged as the FreeR set, and then pads out the missing reflections and converts the flags to the CCP4 style of (0, 1,...,(n-1)).

SHELX "input"
Use F2MTZ and TRUNCATE to convert (H K L I SIGI FreeR_flag) to an MTZ file. See example.

SHELX "output"
Use F2MTZ (and TRUNCATE) to convert (H K L I SIGI FC PHIC FreeR_flag) to an MTZ file. See example.

TNT
The easiest way is to insert a final column of 1 into the working and 0 into the free set, 'cat' the two files together and use F2MTZ. See example.

CNS/XPLOR
See example.

Examples

Starting from CNS/XPLOR (complicated CNS/XPLOR to MTZ)

#
# NREFlection=     10208
# ANOMalous=FALSe { equiv. to HERMitian=TRUE}
# DECLare name=FOBS         DOMAin=RECIprocal   type=COMP END
# DECLare name=SIGMA        DOMAin=RECIprocal   type=REAL END
# DECLare name=FPART        DOMAin=RECIprocal   type=COMP END
# DECLare name=WEIGHT       DOMAin=RECIprocal   type=REAL END
# DECLare name=TEST         DOMAin=RECIprocal   type=INTE END
# INDE     6    0    0 FOBS=  1259.884     0.000 SIGMA=    38.561
#                   FPART=     0.000     0.000 WEIGHT=     1.000 TEST=         0
# INDE     8    0    0 FOBS=   827.600     0.000 SIGMA=    30.983
#                   FPART=     0.000     0.000 WEIGHT=     1.000 TEST=         0
#!/bin/csh -f 
#
f2mtz \
hklin suying/b-over.hkl \
hklout  suying/b-over.mtz \
hklout  suying/b-over.mtz \
<<eof
# skip the NREF and DECLARE lines
SKIP 7
#  For XPLOR you would probably need: SKIP 0
CELL     55.19   79.73   66.68   90.00   90.00   90.00
SYMM C2221
#
# f2mtz assumes a free format without any character data
#  So you must either remove these from the file, or design
# a format statement to skip the labels.
#
# You have to get this format right! nX ignores n characters.
# Count characters
FORMT '(6x,3F5.0,6X,2f10.0,7X,f10.0,/,25X,2f10.0,8X,F10.0,6x,F10.0)'
#
#1234561234512345123451234561234567890123456789012345671234567890
# INDE     6    0    0 FOBS=  1259.884     0.000 SIGMA=    38.561
#1234567890123456789012345123456789012345678901234567812345678901234561234567890
#                   FPART=     0.000     0.000 WEIGHT=     1.000 TEST=         0
#
#
LABO H K L FRBP PHIB SIGFRBP FPART PHIPART WEIGHT FreeR_flag
#
CTYPO H H H F P Q F F W I
END
eof
#
uniqueify suying/b-over.mtz
exit

Starting from SHELX Intensities

f2mtz \
hklin pc553_19.hkl \
hklout  pc553_19i.mtz \
<<eof
CELL    37.144   39.422   44.021  90.00  90.00  90.00
SYMM P212121
LABO H K L I SIGI [ FreeR_flag ]
CTYPO H H H J Q   [    I       ]
END
eof
#
#      To reduce Is to Fs - use truncate
#
truncate \
hklin pc553_19i.mtz \
hklout pc553_19f.mtz \
<<eof
LABI IMEAN=I SIGIMEAN=SIGI
END
eof
#
#  If you read a FreeR_flag, you will now have to rescue it -
#  TRUNCATE ignores it.
#
cad hklin1 pc553_19f.mtz \
    hklin2 pc553_19i.mtz \
    hklout pc553_19f-free.mtz \
<<eof
LABI FILE 1 ALLIN
LABI FILE 2 E1=FreeR_flag
END
eof
#
# Modify FreeR_flags
uniqueify pc553_19f.mtz
#

Starting from TNT or old SHELX (FreeR assigned to 10%)

#   First edit the TNT to assign flag 1 to working set and 0 to free set;
#   then cat both TNT files together:
#
#    sed 's/$/   1/' $SCRATCH/tnt-work.hkl
#    sed 's/$/   1/' $SCRATCH/tnt-work.hkl
#    cat $SCRATCH/tnt-work.hkl $SCRATCH/tnt-work.hkl > $SCRATCH/tnt-all.hkl
#
#  Example piece:
HKL  -22   0   4  2010.9   134.7  1000.0  0.0000   1
HKL  -22   0   5  4005.2    83.1  1000.0  0.0000   1
HKL  -22   0   6  3661.5    91.1  1000.0  0.0000   1
HKL  -22   0   7  2321.9    59.7  1000.0  0.0000   1
....
HKL  -21   1   9   488.4   143.9  1000.0  0.0000   0
HKL  -20   0   6   329.5   202.9  1000.0  0.0000   0
HKL  -20   0  11  1009.2   146.7  1000.0  0.0000   0
HKL  -20   4  10  1989.1    46.5  1000.0  0.0000   0
....
#
f2mtz \
hklin tnt_all.hkl \
hklout  tnt_all.mtz \
<<eof
CELL    37.144   39.422   44.021  90.00  90.00  90.00
SYMM P212121
LABO  H K L F SIGF  FreeRflag
CTYPO H H H F Q    I
#
#  See above comments about formats.. You need to skip the HKL label.

#
FORMT '(4x,3F4.0,2F8.0,16X,F4.0)'
#
or, if PHI and FOM given
#
LABO  H K L F SIGF PHIB FOM  FreeRflag
CTYPO H H H F Q    P    W    I
FORMT '(4x,3F4.0,4F8.0,F4.0)'
END
eof
#
#    uniqueify will now complete hkl list and add FreeRflags
#
uniqueify -f FreeRflag  pc553_19f.mtz
#!/bin/csh -f
#

Starting from SHELX I and FC

f2mtz HKLIN ./1bxo*-sf.hkl \
hklout  $CCP4_SCR/junk.mtz \
<<eof
TITLE X-PLOR to MTZ
CELL  96.980   46.650   65.710  90.00 115.57  90.00
LABOUT H   K  L   I   SIGI   FC PHIC 
CTYPE  H   H  H   I     Q    F P
SKIP 2
SYMM C2
eof
if($status) exit
truncate \
hklin   $CCP4_SCR/junk.mtz \
hklout  $CCP4_SCR/junk1.mtz \
<<eof
LABI IMEAN=I SIGIMEAN=SIGI
TRUNCATE YES
END
eof
#
if($status) exit
cad \
hklin1  $CCP4_SCR/junk1.mtz \
hklin2  $CCP4_SCR/junk.mtz \
hklout ./ibxo-sf.mtz \
<<eof
LABI FILE 1 ALLIN
LABI FILE 2 E1=FC E2=PHIC 
END
eof

AUTHORS

Eleanor Dodson, University of York, England
Maria Turkenburg, University of York, England