RSTATS (CCP4: Supported Program)

NAME

rstats - scale together two sets of F's

SYNOPSIS

rstats hklin foo_in.mtz hklout foo_out.mtz rstatsbkr rstatsbkr.dat
[Keyworded input]

DESCRIPTION

The program scales together two sets of F's, calculates statistics and outputs a reflection file. Data can be split into a working set, and a set reserved for calculation of a freeR factor.

Rejected criterion can be specified as FC/FO ratio, sigma multiple, or |FO-FC|.

KEYWORDED INPUT

The various data control lines are identified by keywords, those available being:
CYCLES, END, FREE, LABIN, LABOUT, LIST, NOABS, OUTPUT, PRINT, PROCESS, REJECT, RESOLUTION, RSCB, SCALE, TEMPERATURE_FACTOR, TITLE, WEIGHTING_SCHEME, WIDTH_OF_BINS

TITLE <title>

The title string is written to the output reflection file, replacing the title from the input file.

If TITLE is not specified then:
OUTPUT FOFC will use Output from RSTATS.
When using LABOUT ALLIN then the title on the input file will be used.

FREE <num>

The FreeR sub-set is defined, in the program, as those reflections which have a value of <num> in the FreeR_flag column. The default is for FreeR_flag = 0.

RESOLUTION <x1> <x2>

If given then only reflections in the resolution range <x1>-<x2> will be used during the final (output) cycle in order to calculate statistics.

Note that this a change to the functionality of the RESOLUTION keyword, which can no longer be used to exclude reflections from the output mtz file.

If RESOLUTION is not specified then the limits <x1> and <x2> are taken from the input MTZ file, so no data is excluded from the statistics. The maximum and minimum resolution (in Angstroms) can be given in either order, and if only one number is given this is taken as the maximum resolution limit.

RSCB <x1> <x2>

If given then reflections in the resolution range <x1>-<x2> will be used during the scaling cycles, in order to generate the scale and temperature factors. The maximum and minimum resolution (in Angstroms) can be given in either order, and if only one number is given this is taken as the maximum resolution limit.

If RSCB is not given, the limits are taken from the RESOLUTION keyword; if RESOLUTION has not been specified the default is to use all the data, i.e. the resolution limits are read from the input MTZ file header.

NOABS

If the NOABS keyword is present, the program will take the differences between the signed values of Fo and Fc, rather than using the moduli (i.e. use Fo and Fc rather than |Fo| and |Fc|). The default is to use the moduli.

SCALE <scale>

Sets initial scale factor for Fc. If zero cycles are selected on the CYCLES card, this scale factor is used for the calculation of R-factors and scaling output data. Default is 1.0.

TEMPERATURE_FACTOR <factor>

Sets initial value for the temperature factor. If zero refinement cycles selected using the CYCLES card, this temperature factor is used for calculation of R-factors and scaling output data. Default: 0.0.

WIDTH_OF_BINS [ RTHETA <x1> ] | [ FBINR <x2> ]

[Optional]

Controls the width of the bins used in the analysis.

RTHETA = <x1> sets the width of ranges of 4(sintheta/lambda)**2; default: 0.01.

FBINR = <x2> sets the width of ranges on Fobs. If x2 is not specified or the card absent then Fobs range will be set by the program. The width is altered accordingly if the scale is applied to Fobs.

LIST <x>

[Optional]

Sets the value for listing of reflections with |Fo-Fc| > <x>. Default: 4000.0.

CYCLES <ncyc>

[Optional]

<ncyc> is the maximum number of cycles for scaling; default: 6.

The program will always make one additional pass through the reflection file to calculate statistics and write the output file. If zero cycles are specified then the program will simply apply the input scale and temperature factor. If a linear least-squares problem is selected with no rejections, the program will only make two passes through the input file. The program will stop iterating when the magnitude of the fractional shift in the scale factor is less than 0.005 and the magnitude of the shift in the temperature factor is less than 0.01.

PRINT ALL | LAST

ALL sets IPRINT on all cycles

LAST (default) sets IPRINT, then print out on ONLY final least squares cycle.

REJECT [ SIGMA=<sig> ] [ RATIO=<rat> ] [ DELTA=<delta> ]

This option sets criteria for rejecting reflections from the scaling calculations. The rejected reflections are still written to the output file. More than one of the following options may be specified simultaneously for REJECT:
RATIO
Reflections will be rejected if K*Fc*TFAC/FO < <rat> (i.e. those with FC<<FO). Default is <rat>=0.0.
SIGMA
Reflections will be rejected if Fo < <sig>*SigFo. Default is <sig>=0.0.
DELTA
Reflections will be rejected if abs(Fo - K*TFAC*Fc) > <delta>. The default is <delta>=99999.0 (i.e. no rejection tests).

OUTPUT [ NOHKL | FOFC ] [ BKR ]

The output reflection file contains all the reflections present in the input file. Note that this is different from previous versions of rstats. If OUTPUT is not given or it is not followed by a sub-keyword, then FOFC is assumed. Exception when you have LABOUT ALLIN.
NOHKL
No output file
FOFC
The output reflection file has H, K, L, FP, FC with optionally SIGFP, SIGFC, PHIC, FREE if these are present on the input file. If weights are used in the scaling then the output file will include this weight as WT.

Under this option RSTATS will also write an additional history line to the mtz header, containing: the date; the R-factor; the scale and temperature factors. In this case the R-factor is that calculated on the final cycle with reflections excluded as defined by the RESOLUTION keyword.

BKR
The final temperature factor (B), scale factor (K), R factor and the sum of w*(Fo-Fc)**2 are written on one line in the file RSTATSBKR (i.e. RSTATSBKR.DAT in the default directory unless otherwise assigned) along with the date (as day-month-year). The format statement controlling this output is
    FORMAT(2F10.5,F7.3,E13.6,1X,I2,"-",I2,"-",I2)
(The output file is scaled as defined by PROCESS.)

PROCESS [ FCAL | FOBS | FOBC | SUMF | SUMC | LGFC | LGFO ]

For the FCAL, FOBS and FOBC options, the scale factor (K) and temperature factor (B) are determined by minimising

Sum w(Fo - K*Fc*exp(-B*s))**2

This non-linear least squares minimisation takes several cycles to converge.

For the SUMF and SUMC options, the temperature factor is not considered and the scale factor is calculated by minimising

Sum w(Fo - K*Fc)**2

So that K = Sum(wFoFc)/Sum(wFc**2)

Although a linear problem, if reflections are being rejected using the DELTA test (see REJECT), several cycles may be required for convergence.

For the LGFC and LGFO options, the scale and temperature factors are determined by minimising

Sum w( Log(Fo) - Log(K*Fc*exp(-B*s)) )**2

By considering the logarithms, the least squares minimisation becomes a linear problem but with different relative weighting. This scaling gives greater weight to the weak reflections than the minimisation without taking logs.

A weight of W=(Fo/SigFo)**2 should give similar results to a weight of W=(1/SigFo)**2 in the non-linear case.

FCAL
Apply scale and B-factor to Fcalc and sigFc
FOBS
Apply scale and B-factor to Fobs and sigFobs
FOBC
Apply scale to Fobs and sigFobs, and B-factor to Fcalc
SUMF
Calculate scale by Sum(FoFc)/SumFcFc) and apply inverse of this to Fo i.e. temperature factors are not refined and scale calculated without considering it.
SUMC
as SUMF but apply scale to Fc
LGFC
Apply scale and B-factor to Fcalc and sigFc
LGFO
Apply scale and B-factor to Fobs and sigFobs

WEIGHTING_SCHEME [ NONE | DELF=<x1>,<x2>,<x3>,<x4> | DSIG=<x1>,<x2>,<x3>,<x4> | EXP=<x1>,<x2>,<x3> | SIGMA=<x1> ]

Weight reflections according to one of the following schemes [Default is NONE]:
NONE
No weighting scheme to be used
SIGMA
W=<x1>*(1/SD(FO))**2
default: x1=1.0.
DELF
W=1/(<x1>+<x2>*S) for S > <x4> (S=sintheta/lambda)
W=1/(<x1>+<x2>*S+<x3>*(<x4>-S)**2 for S < <x4>
there are no defaults for this option and all parameters must be specified.
DSIG
As DELF but multiplied by (1/SD(FO))**2
EXP
W=((1/SD(FO))**2)*<x1>/exp(<x2>+<x3>*S)
the defaults are x1=1.0 x2=0.0 and x3=0.0.

LABIN <program_label>=<file_label> ...

Input reflection file column assignments.

Assigns the program labels to the columns on the input file. The program labels are:

H K L FP SIGFP FC SIGFC PHIC FREE

Data must always be present for H K L FP and FC. SIGFP must also be present when using the SIGMA weighting scheme. FREE flags reflections to be considered separately, to give statistics needed for Free R factors.

LABOUT [ALLIN] <program_label>=<file_label> ...

Output reflection file column assignments.

For OUTPUT FOFC the output program labels are

H K L FP FC [ SIGFP SIGFC PHIC WT FREE ]

Where SIGFP, SIGFC and PHIC are only written if they are present on the input file. The weight WT is only written if a WEIGHTING_SCHEME option is specified. By default the output columns will have the same column labels as used on the input file.

If ALLIN is given as a sub-keyword then all columns in the input file will be written to the output MTZ file. This option has preference over the other options for MTZ files.

END

Terminate input (equivalent to end-of-file). Must be last keyword.

EXAMPLES


#
#  Produce file containing h,k,l,s,Fp,Sigfp,Fc,Phic with Fc scaled
#  to Fo for input to the FFT program.  No reflections rejected.
#
#
rstats hklin sample_file hklout fuo_map <<eof-rstats
LABIN FP=FNAT2 SIGFP=SIGFNAT2 FC=FCCYC7 PHIC=PHI FREE=FreeR_flag
RESOLUTION 8.0 2.7    ! If omitted then all data used
eof-rstats

#
#
#  A more complicated example:
#  All input columns output with an additional weight column.
#  Contents of the output FNAT2 and SIGFNAT2 columns will have
#  a scale and temperature factor applied. 
#
rstats hklin sample_file hklout fuo_map <<eof-rstats
LABIN FP=FNAT2 SIGFP=SIGFNAT2 FC=FNAT1 FREE=FreeR_flag
LABOUT ALLIN WT=SIGMAWT
TITLE  FNAT2 column scaled to FNAT1 using sigma weights
RESOLUTION 10.0 2.3                    ! default is 1 to 100 Ang
PRINT ALL                              ! default is LAST
CYCLES 3                               ! default is 6
LIST 3000
SCALE 2.3                              ! default is 1.0
TEMPERATURE_FACTOR 6.2                 ! default is 0.0
OUTPUT FOFC                            ! this is OVERRIDEN by LABOUT
REJECT DELTA 4000                      ! default is no rejections
WEIGHTING_SCHEME SIGMA                 ! default is NONE
WIDTH_OF_BINS RTHETA=0.02 FBINR=500    ! defaults are .01 and 1000
PROCESS FOBS                           ! default is FCAL
eof-rstats
There is also a simple runnable unix script in $CEXAM/unix/runnable:

AUTHORS

Written by: S.E.V. Phillips
modified: Dec.1985 G.Fermi (2-6-88)
modified: Nov.1986 A.C.Bloomer
This keyworded version 24/jan/1990: Peter Brick