SCALA (CCP4: Supported Program)

NAME

scala - scale together multiple observations of reflections

SYNOPSIS

scala HKLIN foo_in.mtz HKLOUT foo_out.mtz
[Keyworded Input]

Keyworded input summary
References
Input and Output files
Examples
Release Notes

DESCRIPTION

Scaling options
Control of flow through the program
Partially recorded reflections
Scaling algorithm
Corner correction
TAILS correction
Data from Denzo
Datasets
Data harvesting

This program scales together multiple observations of reflections, and merges multiple observations into an average intensity.

Various scaling models can be used. The scale factor is a function of the primary beam direction, either as a smooth function of Phi (the rotation angle ROT), or expressed as BATCH (image) number (deprecated). In addition, the scale may be a function of the secondary beam direction, acting principally as an absorption correction, either expanded as spherical harmonics, or as a interpolated three-dimensional function of Phi and the spatial coordinates of the measured spot on the detector. Such three-dimensional scaling is typically somewhat ill-determined, but it is generally useful if suitably restrained (see below for discussion of this) and should normally be used. The secondary beam correction is related to the absorption anisotropy correction described by Blessing (Ref Blessing (1995) ), the interpolated three-dimensional correction is similar to that described by Kabsch (Ref Kabsch (1988)).

The merging algorithm analyses the data for outliers, and gives detailed analyses. It generates a weighted mean of the observations of the same reflection, after rejecting the outliers.

The program does three passes through the data:

  1. a scaling pass: firstly, there is an initial estimate of the scales, then the scale parameters are refined
  2. an analysis pass to refine the standard deviation estimates
  3. a final pass to apply scales, analyse agreement & write the output file, usually with merged intensities, but alternatively as a copy of the input file with evaluated scales appended to each observation.

Normally anomalous scattering is ignored during the scale determination (I+ & I- observations are treated together), but the merged file always contains I+ & I-, even if the ANOMALOUS OFF command is used. Switching ANOMALOUS ON does affect the statistics and the outlier rejection (qv)

Scaling options

The optimum form of the scaling will depend a great deal on how the data were collected. It is not possible to lay down definitive rules, but some of the following hints may help. For most purposes, my normal recommendation is

  scales rotation spacing 5 secondary 6 bfactor on brotation spacing 20 
Other hints:-
  1. If successive images are collected with the same detector (on-line detector) or equivalent detectors, and the beam intensity is steady or smoothly varying, then use a smoothed scaling options. Only use the SCALE BATCH option if every image is different from every other one, i.e. off-line detectors (including film), or rapidly or discontinuously changing incident beam flux. This may sometimes but rarely be the case for synchrotron data (if a "dose" mode is not used). It is possible to "mix-and-match" options. For instance, the best option for data from an unstable synchrotron beam may be e.g. SCALES BATCH BFACTOR ON BROTATION SPACING 10, which will make the Bfactor variation smooth, but the scales discontinuous by batch.
  2. If there is a discontinuity between one set of images and another (e.g. change of exposure time), then flag them as different RUNs. This will be done automatically if no runs are specified.
  3. The SECONDARY correction is recommended: this provides a correction for absorption and is better than the DETECTOR option. It should always be restrained with a TIE SURFACE command (this is the default): under these conditions it is reasonably stable under most conditions, even in the absence of a reference dataset. The ABSORPTION (crystal frame) correction is similar to SECONDARY (camera frame) in most cases, but may be preferable if data has been collected from multiple alignments of the same crystal.
  4. Use a B-factor correction unless the data are only low-resolution. Traditionally, the relative B-factor is a correction for radiation damage (hence it is a function of time), but it also includes some other corrections eg absorption.
  5. The TAILS correction might be tried if the fractional bias is significant: this is only useful if there are many fully recorded reflections (ie rarely). The refinement of the TAILS parameters is not very robust, and it may be necessary to FIX A1 (this should be improved).
  6. When trying out more complex scaling options (eg TAILS), it is a good idea to try a simple scaling first, to check that the more elaborate model gives a real improvement.
  7. When scaling multiple MAD data sets they should all be scaled together in one pass, outliers rejected across all datasets, then each wavelength merged separately. This is  the default if multiple datasets are present in the input file. For isomorphous replacement, it may sometimes be useful to provide a native dataset as a reference, to make the systematic errors in the derivative similar to those in the native (ie "local" scaling, using the SECONDARY option).
Other options are described in greater detail under the KEYWORDS.

Control of flow through the program

Each of the stages can be individually activated or suppressed. Particularly useful options are:

Partially recorded reflections

See appendix 1

Partially recorded reflections are by default included the scaling pass, as well as included in the final analysis and merging. They may optionally be excluded from the scaling (controlled by the command INTENSITIES), and excluded from the final analysis (controlled by the command FINAL). Note that this default has changed from some antique versions

The different options for the treatment of partials are set by either the PARTIALS command, effective for both scaling & merging stages; or separately for the scaling stage only (INTENSITIES command) or for the merging stage only (FINAL command).

Partials may either be summed or scaled : in the latter case, each part is treated independently of the others.

Summed partials [default]:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

Scaled partials:
In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5]. This only works well if the calculated fractions are accurate, which is not usually the case.

Scaling algorithm

See appendix 2

Corner correction

CCD detectors underestimate the intensities of spots close to the edges and particularly in the corners of the tiles, due to the point spread function from the optical taper. This is a significant problem  for 3x3 tiled detectors, as the corners lie in critical parts of the diffraction pattern. The spot intensities may be corrected using a calibrated correction table for the individual detector, using the pixel coordinates in the HKLIN file. The table is given to Scala as an ADSC image format file, and activated by the CORNERCORRECT command. Acknowledgements for this correction are due to the following people: Andy Arvai, Xuong Nguyen-huu, Chris Nielsen, Raimond Ravelli, Gordon Leonard, Sean McSweeney, Sandor Brockhauser, and Andrew McCarthy.

Note that at present Scala has no way of knowing which detector was used, so it is up to the user to provide the correct file: correction files should be available from the synchrotron beamlines, or from the detector manufacturers.

TAILS correction

The TAILS (SCALES .. TAILS) correction may be used to improve poor partial bias: this is an attempt to allow for the difference in scan width between fulls and partials. A partial is measured across twice (or 3 times etc) the rotation width of a full, so more of the diffuse scattering tails are included in the intensity, leading to an under-estimation of the fulls relative to partials. This correction is not very robust (though more so than in earlier versions of Scala), and the parameters may be unstable: you should always try first without this correction, and check that it really does improve the data statistics, without applying ridiculously large corrections. This correction is only useful if you have a large proportion of fully-recorded observations.  See appendix 3 for more details.

Data from Denzo

Data integrated with Denzo may be scaled and merged with Scala as an alternative to Scalepack, or unmerged output from scalepack may be used. Both have some limitations. See appendix 4 for more details.

Datasets

Data in MTZ files are assigned to "datasets", within a hierarchy of Crystal/Dataset. A crystal also has a "project name" which is not part of the hierarchy but is used to group data for harvesting. Each of these levels of hierarchy has "properties": a crystal has a unit cell, and a dataset has a wavelength. Unmerged data files as used in Scala typically contain a single dataset, but may contain multiple datasets if for instance multiple wavelength datasets are being scaled together, or if a reference set is present. Each BATCH in the file is assigned to a specific dataset.

Assigning a dataset:-
  1. Preferably, a project name, crystal name and dataset name should be assigned when the file is created, eg in Mosflm
  2. Utility programs eg  (or REBATCH) may be used to (re)assign dataset names and add or correct dataset properties (wavelength and cell)
  3. Names may be (re)assigned within Scala using the NAME command. This may be useful if names have not been assigned before, or if data from different crystals are merged into a single dataset. Note that each NAME command defines a different output dataset.
Using datasets in Scala:
  1. A RUN may not contain batches from different datasets, but a dataset may contain multiple runs. Datasets may be explicitly assigned to runs (see the RUN command).
  2. By default, each dataset is written out to a different output file, (see OUTPUT options).
  3. By default, outliers are rejected across all datasets (unless REJECT SEPARATE). This is normally a sensible thing to do for MAD data, since the expected differences are small, but carries with it the danger of rejecting real differences. By default, the rejection test is automatically adjusted upwards (to accept larger differences) if the anomalous signal is strong, but this is not very precise. If you have for strong signals and good data, check the ROGUES file & the value of the I+/I- test & reset it if necessary, eg
             ANOMALOUS ON
    REJECT 6 ALL 15 # to check between I+ and I-
  4. Various analyses are done between datasets, comparing the anomalous differences and the dispersive (isomorphous) differences from a defined "base" set (ie correlation between ((I(i) - I(base)) and (I(j) - I(base)) (i .ne. j .ne. base)). Typically the base dataset would be a high-energy remote (this is the default), but it may be set with the BASE command.

Data Harvesting

Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the program will automatically produce a data harvesting file. This file will be written to

$HARVESTHOME/DepositFiles/<projectname>/ <datasetname>.scala

The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory.

See also Data Harvesting.

KEYWORDED INPUT - SUMMARY

Summary classification of keywords

KEYWORDED INPUT - DESCRIPTION

In the definitions below "[]" encloses optional items, "|" delineates alternatives. All keywords are case-insensitive, but are listed below in upper-case. Anything after "!" or "#" is treated as comment. The available keywords are:

ACCEPT, ANALYSE, ANOMALOUS, BASE, BINS, CORNERCORRECT, CYCLES, DAMP, DUMP, EXCLUDE, FILTER, FINAL, HISTORY, INITIAL, INSCALE, INTENSITIES, LINK, NAME, NODUMP, NOHARVEST, NORMALISE, NOSCALE, ONLYMERGE, OUTPUT, OVERLAPMAP, PARTIALS, PRINT, PRIVATE, REJECT, RESOLUTION, RESTORE, RSIZE, RUN, SCALES, SDCORRECTION, SKIP, SMOOTHING, TIE, TITLE, [UN]FIX, UNLINK, USECWD, WIDTH, XYBINS

RUN <Nrun> [<subkeys>]

Define a "run" : Nrun is the Run number, with an arbitrary integer label (i.e. not necessarily 1,2,3 etc). A "run" defines a set of reflections which share a set of scale factors. Typically a run will be a continuous rotation around a single axis. The subkeys allow definition of a run in a flexible way. The definition of a run may use several RUN commands. If no RUN command is given, or if the ALL keyword is used, then run assignment will be done automatically, with run breaks at discontinuities in dataset, batch number or Phi. Batches or batch ranges may still be excluded, either with the EXCLUDE subkey here, or by using the EXCLUDE keyword (qv)

Subkeys:
REFERENCE
This run is a reference set, i.e. it will be given a single scale factor = 1.0 (an input scale factor in the SCALE column will still be applied if present). Reference datasets are (by default) excluded from the merging process, both from the output intensities and from the statistics
BATCH | <b1> <b2> <b3> ... | <b1> TO <b2> |
Define a list of batches, or a range of batches, to be included in or excluded from the run. If batches are included in more than one run definition, the last definition will take priority.
ALL
Include all batches. In this case automatic run assignment will be done: to override this use eg RUN 1 BATCH 1 to 99999
CRYSTAL <crystal_name>
Define a crystal name to be included in the run. This would usually be used in conjunction with the DATASET subkey.
DATASET <dataset_name>
Define a dataset name to be included in the run. A crystal name may be combined with the dataset name using the syntax <crystal_name>/<dataset_name>. The dataset names used here are those present in the input file, not those assigned or altered by the NAME command.
INCLUDE | EXCLUDE
Set include/exclude flag for a following RANGE or BATCH keyword. Excluded batches or ranges will be omitted from the output file.
RANGE <r1> TO <r2>
Rotation range to include or exclude

Examples:

  RUN 1 BATCH 1 TO 10000    # unconditionally include all batches
RUN 1 ALL EXCLUDE 77 79 132 # automatic run splitting will be done
RUN 1 INCLUDE BATCH 1 TO 200 EXCLUDE 77 79 132
RUN 2 CRYSTAL Native DATASET Lambda1
RUN 3 DATASET Native/Lambda2
RUN 4 INCLUDE RANGE 0 TO 90 EXCLUDE RANGE 45 TO 48

SCALES [<subkeys>]

Define layout of scales, ie the scaling model. Note that a layout may be defined for all runs (no RUN subkeyword), then overridden for particular runs by additional commands.

Subkeys:
RUN <run_number>
Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs
ROTATION <Nscales> | SPACING <delta_rotation>
Define layout of scale factors along rotation axis (i.e. primary beam), either as number of scales or (if SPACING keyword present) as interval on rotation [default SPACING 10]
BATCH
Set "Batch" mode, no interpolation along rotation (primary) axis. This option is compulsory if a ROT column is not present in the input file, but otherwise the ROTATION option is preferred.
SMOOTH <delta_batch>
Set smoothed Batch mode: this treats the batch number as a rotation angle, and interpolates along rotation axis in the same way as the ROTATION option. <delta_batch> sets the interval on batches (ie the number of batches to smooth over). This option is an alternative to ROTATION if you have lost the information in the ROT column (spindle rotation angle (Phi)), but otherwise the ROTATION option is preferred.
BFACTOR ON | OFF | ANISOTROPIC
Switch Bfactors on or off. The default is ON, but Bfactor refinement will be switched off by default if the scales are allowed to vary across the detector (qv DETECTOR). The ANISOTROPIC keyword activates anisotropic Bfactors (NOT RECOMMENDED): beware that the parameters for this option is likely to be poorly determined. Note that the anisotropic correction is centrosymmetric.
BROTATION [|TIME] <Ntime> | SPACING <delta_time>
Define number of B-factors or (if SPACING keyword present) the interval on "time": usually no time is defined in the input file, and the rotation angle is used as its proxy. SCALES BATCH BROTATION SPACING 5 make the Bfactor variation smooth, but the scales discontinuous by batch.
SECONDARY [<Lmax>]
Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the camera spindle frame. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). This correction would typically be combined with the usual primary beam correction (eg ROTATION SPACING 5 SECONDARY 6). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]
ABSORPTION [<Lmax>]
Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the crystal frame based on POLE (qv). The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). This correction would typically be combined with the usual primary beam correction (eg ROTATION SPACING 5 ABSORPTION 6). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]. This is not substantially different from SECONDARY in most cases, but may be preferred if data are collected from multiple settings of the same crystal, and you want to use the same absorption surface. This would only be strictly valid if the beam is larger than the crystal.
SURFACE [<Lmax>]
Local correction expanded on direction of the scattering vector in hkl space (ie crystal frame) in spherical harmonics up to maximum order Lmax. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). The polar axis may be specified with the POLE keyword (qv). If you want to do 3-dimensional scaling, the SECONDARY or ABSORPTION option is preferable: this option should only be used if the diffraction geometry information required to work out the beam directions is not available.
POLE <h|k|l>
Define the polar axis for ABSORPTION or SURFACE as h, k or l (eg POLE L): the pole will default to either the closest axis to the spindle (if known), or l (k for monoclinic space-groups).
DETECTOR <Nscales_X> [<Nscales_Y>] | SPACING <delta_X> [<delta_Y>]
Define layout of scale factors on detector (i.e. secondary beam), either as number of scales in each direction (along XDET & YDET), or (if SPACING keyword present) as interval on XDET & YDET. The values for Y default equal to those for X is not specified. This option assumes that the detector positions are recorded in the input file (columns XDET, YDET), in any units (mm or pixels). If you allow the scale to vary across the detector (anything other than DETECTOR 1, the default), then by default Bfactor refinement is switched off, since the combination is likely to be unstable [Default 1 scale, i.e. no variation of scale across detector]. The SECONDARY option is probably better.
CONSTANT
One scale for each run (equivalent to ROTATION 1)
TAILS [<v> [<a0> [<a1>]]]
Not normally recommended. Apply correction for diffuse scattering (reflection tails) for this run. This can only be used with summed partials (INTENSITIES PARTIALS: this is the default). See introduction for explanation. Initial values for the parameters v, a0 & a1 may be given following the keyword
Parameters may be fixed using the FIX command, or the same set used by different runs as defined by the LINK command. These controls may be required to avoid the parameters going wild..
SLOPE
NOT RECOMMENDED. Set "Slope" mode, like Batch, except that each batch has different scales at the beginning and end of the rotation range. The value used for each reflection is interpolated linearly according to the "Rotation" (phi) value. SLOPE implies BATCH mode. Be careful with this option: does it really improve the data? It is unlikely to work well if the mosaicity is large. TIE ROTATION may be used to restrain the difference in scales.

CORNERCORRECT <correction table filename>

Apply "corner correction" for CCD detectors, see above. This applies a correction on input based on the pixel coordinates of the observation, using a calibrated table of correction factors. The name of the file containing the corrections (as an ADSC image) is given here, or on the command line as the CORNERCORRECT parameter.

SDCORRECTION [[NO]REFINE]   [UNIFORM | INDIVIDUAL | COMMON]  [FIXSDB] [[NO]ADJUST] [RUN <RunNumber>] [FULL | PARTIAL | BOTH] <SdFac> [<SdB>] <SdAdd>

Input or set options for the "corrections" to the input standard deviations: these are modified to

        sd(I) corrected = SdFac * sqrt{sd(I)**2 + SdB*Ihl + (SdAdd*Ihl)**2}


where Ihl is the intensity and LP is the Lorentz/Polarization factor (SdB may be omitted in the input). Note that the SdB term was multiplied by the LP factor in versions from version 3.3.0 to 3.3.8, but not in earlier or later versions: the values of SdB cannot be compared between these versions.
The default is "SDCORRECTION REFINE INDIVIDUAL NOADJUST"

The keyword REFINE controls refinement of the correction parameters, essentially trying to make the plot of the SD of the distribution of fraction deviations (Ihl - <I>)/sigma  = 1.0  over all intensity ranges. The residual minimised is Sum( w * (1 - SD)^2)  where w = number of reflections in that intensity bin. Other subkeys control what values are determined and used for different runs (if more than one)

The keyword ADJUST activates an automatic adjustment of the Sdfac parameters from the normal probability analysis, after any REFINE step [default is NOADJUST] (this applies to all runs)

RUN <run_number>
Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs. Different values may be specified for fully recorded reflections (FULL) and for partially recorded reflections (PARTIAL), or the same values may be used for both (BOTH), e.g.

         sdcorrection full 1.4 0.11 part 1.4 0.05

With the output options SEPARATE or POSTREF, the modified Sds are written to the output file in columns SIGIC [& SIGIPRC if IPR is present]. These columns will be used by Postref but ignored on reinput to Scala.

PARTIALS [NO]CHECK [NO]TEST [<lower_limit> <upper_limit>] CORRECT <minimum_fraction>] [NO]GAP MAXWIDTH <maximum_width> SCALE_PARTIAL <minimum_fraction> USE_PROFILE

Select the way in which partials are treated in both scaling and merging. These settings may be overridden separately for the scaling and merging steps with the INTENSITIES and FINAL commands respectively.

By default, partials are included (summed) in both scaling and in merging.

Subkeys:
[NO]CHECK
do [not] check for consistency of MPART flags (if present, i.e. from Mosflm). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
[NO]TEST [<lower_limit> <upper_limit>]
do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
CORRECT [<minimum_fraction>]
Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
[NO]GAP
do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set [default NOGAP]
MAXWIDTH <maximum_width>
maximum number of parts for an acceptable summed partial
SCALE_PARTIALS
use scaled partials greater than <Minimum_fraction>. Only use this if the FRACTIONCALC column contains a good estimate of the partiality, and if you really need to recover these observations.
USE_PROFILE
use profile-fitted intensity even for scaled partials

INTENSITIES

[INTEGRATED | PROFILE | PR_PART | COMBINE [<Imid>] [POWER <Ipower>] ]
[[NO]ANOMALOUS]
[FULLS | ONLYFULLS | SCALE_PARTIAL <minimum_fraction>
| PARTIALS [ [NO]CHECK | [NO]TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction> ] [ [NO]GAP ] [MAXWIDTH <maximum_width>] ] ]

Intensities selection for scaling: which intensities to use, whether to keep Bijvoet pairs separate, and treatment of partials in scaling:

(a) Intensity selection options:

Set which intensity to use, of the integrated intensity (column I) or profile-fitted (column IPR), if both are present. Note this applies to all stages of the program, scaling & averaging.

Subkeys:
INTEGRATED
summation integrated intensity I.
PROFILE
profile-fitted intensity IPR [default if present]. Note that this will not be used for scaled partials unless PARTIALS USE_PROFILE is set.
PR_PART
profile IPR for fullys, integrated for partials
COMBINE [<Imid>] [POWER <Ipower>]
Use weighted mean of profile-fitted & integrated intensity, profile-fitted for weak data, summation integration value for strong.
I = w*Ipr + (1-w)*Iint
w = 1/(1 + (Iint/Imid)**Ipower)

Imid may either be given here explicitly or by default will be set to the mean unscaled intensity.
Ipower defaults to 3.

(b) Treatment of Bijvoet-related observations

By default, all observations (I+ & I-) are treated alike in scaling. This is normally the correct thing to do, since the anomalous differences are usually small and randomly positive and negative. In a case with large anomalous differences and high redundancy, it may be better to keep the I+ & I- observations separate in the scaling. Note that typically this will severely reduce the scaling overlaps between different parts of the data, and is not recommended except in special cases.

Subkeys:
ANOMALOUS
keep I+ and I- observations separate in scaling
NOANOMALOUS
use I+ and I- together in scaling [default]

(c) Options for treatment of partials in scaling (overrides options given under PARTIALS):

Set whether partially recorded reflections should be used in scaling, & if so, whether to use summed or scaled partials. By default summed partials are used in scaling as well as fulls. See introduction above for a description of the use of partially recorded reflections. Treatment of partials in the final averaging stage is defined with the FINAL command

Subkeys:
FULLS
use fully recorded observations only, & previously summed partials (from MOSFLM ADDPART)
ONLYFULLS
use fulls only: exclude previously summed partials (from MOSFLM)
SCALE_PARTIALS
use scaled partials greater than <Minimum_fraction> in the scaling. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
PARTIALS
use summed partials in scaling (if present) [this is the default]. The following flags are qualifiers of PARTIALS and will override those given on a previous PARTIALS command, for the scaling step only (not merging):
[NO]CHECK
do [not] check for consistency of MPART flags (if present). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
[NO]TEST [<lower_limit> <upper_limit>]
do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
CORRECT [<minimum_fraction>]
Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
[NO]GAP
do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set [default NOGAP]
MAXWIDTH <maximum_width>
maximum number of parts for an acceptable summed partial

REJECT
[SCALE | MERGE] [COMBINE] [SEPARATE]
<Sdrej> [<Sdrej2>]
[ALL <Sdrej+-> [<Sdrej2+->]]
[KEEP | REJECT | LARGER | SMALLER]

Define rejection criteria for outliers: different criteria may be set for the scaling and for the merging (FINAL) passes. If neither SCALE nor MERGE are specified, the same values are used for both stages. The default values are REJECT 6 ALL -8, ie test within I+ or I- sets on 6sigma, between I+ & I- with a threshold adjusted upwards from 8sigma according to the strength of the anomalous signal. The adjustment of the ALL test is not necessarily reliable.

If there are multiple datasets, by default, deviation calculations include data from all datasets [COMBINE]. The SEPARATE flag means that outlier rejections are done only between observations from the same dataset. The usual case of multiple datasets is MAD data.

If ANOMALOUS ON is set, then the main outlier test is done in the merging step only within the I+ & I- sets for that reflection, ie Bijvoet-related reflections are treated as independent. The ALL keyword here enables an additional test on all observations including I+ & I- observations. Observations rejected on this second check are flagged "@" in the ROGUES file. In the scaling step, the outlier check includes all observations, unless anomalous observations are kept separate in scaling (INTENSITIES ANOMALOUS: this is an unusual option for special cases only).

Subkeys:
SEPARATE
rejection & deviation calculations only between observations from the same dataset
COMBINE
rejection & deviation calculations are done with all datasets [default]
SCALE
use these values for the scaling pass
MERGE
use these values for the merging (FINAL) pass
sdrej
sd multiplier for maximum deviation from weighted mean I [default 6.0]
[sdrej2]
special value for reflections measured twice [default = sdrej]
ALL
check outliers in merging step between as well as within I+ & I- sets (not relevant if ANOMALOUS OFF). A negative value [default -8] means adjust the value upwards according to the slope of the normal probability analysis of anomalous differences (AnomPlot)
sdrej+-
sd multiplier for maximum deviation from weighted mean I including all I+ & I- observations (not relevant if ANOMALOUS OFF)[default check within I+ & I- sets only]
[sdrej2+-]
special value for reflections measured twice [default = sdrej+-]
KEEP
in merging, if two observations disagree, keep both of them [default]
REJECT
in merging, if two observations disagree, reject both of them
LARGER
in merging, if two observations disagree, reject the larger
SMALLER
in merging, if two observations disagree, reject the smaller

The test for outliers is described in Appendix 5

ANOMALOUS [OFF] [ON | ALL]

[RUN <Nrun>]
[MATCH [ [NO]INRUN | SPINDLE | INVERT | <hkl symmetry>]
[PHIDIF <maximum Phi difference>]
[TIMEDIF <maximum Time difference>]]

Controls the treatment of anomalous scattering information in the merging step. Note that the option of selecting matching anomalous pairs is not recommended for normal use: it is likely to lead to seriously incomplete data in many cases, and the results should be compared carefully with those with the MATCH option switched off.

Subkeys:
OFF [default]
no anomalous used, I+ & I- observations averaged together in merging
ON | ALL
separate anomalous observations in the final output pass, for statistics & merging: this is also selected the keyword ANOMALOUS on its own
RUN <run number>
set run for this MATCH option to apply to, otherwise it applies to all runs [default]
MATCH
use only matching I+ & I- pairs in merging
Matching pairs are :-
  • (a) in same run if INRUN is specified [default NOINRUN]
  • (b) related by defined symmetry (if given as SPINDLE | INVERT | <hkl symmetry>)
  • (c) not more than DeltaPhi apart (if given by PHIDIF)
  • (d) not more than DeltaTime apart (if given by TIMEDIF and a TIME column is present)

  • INRUN
    Matching pairs must be in the same run [default NOINRUN]

    Definition of symmetry:-

    SPINDLE
    related by negation of reciprocal index closest to spindle: this option requires full orientation data to be present in the file
    INVERT
    related by inversion of indices, i.e. -h, -k, -l
    <hkl symmetry>
    specified hkl symmetry (e.g. h, -k, l)

    PHIDIF <DeltaPhi>
    maximum difference in Phi (ROT) between matching pairs
    TIMEDIF <DeltaTime>
    maximum difference in TIME between matching pairs

    RESOLUTION [RUN <Nrun>] [DATASET <dataset_name>] [[LOW] <Resmin>] [[HIGH] <Resmax>]

    Set resolution limits in Angstrom, either order, optionally for individual datasets, or for runs (in which case this command MUST come after definition of the run). The keywords LOW or HIGH, followed by a number, may be used to set the low or high resolution limits explicitly: an unset limit will be set as in the input HKLIN file. If the RUN & DATASET subkeywords are omitted, the limit applies to all runs. A crystal name may be combined with the dataset name using the syntax <crystal_name>/<dataset_name>. The dataset names used here are those present in the input file, not those assigned or altered by the NAME command. [Default use all data]

    TITLE <new title>

    Set new title to replace the one taken from the input file. By default, the title is copied from hklin to hklout

    ONLYMERGE

    Only do the merge step, no initial analysis, no scaling (== INITIAL NONE; NOSCALE). Note that this will usually need to be combined with a RESTORE command.

    RESTORE [<Scale_file_name>]

    Read initial scales from a SCALES file from a previous run of Scala (scales are normally dumped on every cycle, see DUMP). The number of scales defined for each run this time should typically be the same as in the dump, although a set of scale factors along ROTATION or DETECTOR may be extrapolated to additional batches which were not present in the initial scaling. The file may contain scales for runs which are not used this time, but new runs may not be added. RESTORing from a scale file which does not properly correspond to the run which generated the file is liable to give silly results. No initial analysis pass will be done unless the command INITIAL ANALYSE is given.

    INITIAL MEAN | UNITY | RUN <RunNumber> <InitialScale> | NONE | ANALYSE

    Define method of setting initial scales

    Subkeys:
    MEAN
    from mean intensities by rotation range [default]
    UNITY
    set all scales = 1.0
    RUN <RunNumber> <InitialScale>
    set initial scale factor for this run If this option is used, any runs whose scales are not set explicitly will have their scales set = 1.0
    NONE
    no initial analysis pass, set all scales to unity
    ANALYSE
    force initial analysis pass even if RESTORE option is used

    PRINT [<subkey>]

    Define amount of printing

    Subkeys:
    NONE
    almost none
    BRIEF
    some more [default]
    CYCLES
    more information about each minimization cycle
    FULL
    quite a lot
    DEBUG [<reflection_interval>]
    far too much: also define reflection interval for printing
    ALLOVERLAP
    print all numbers in overlap matrix after initial pass, rather than the default condensed table
    OVERLAP
    print condensed table of overlap matrix after initial pass
    NOOVERLAP
    no printing of overlap matrix after initial pass [default]

    CYCLES [[NUMBER] <Ncycle>] [CONVERGE <Conv_limit>] [REJECT <Rej_cycle>] [WEIGHT VARIANCE | UNIT ]

    Define number of refinement cycles, convergence limit, and weighting scheme for scale refinement

    Subkeys:
    [NUMBER]
    maximum number of cycles [default 10]
    CONVERGE
    convergence limit (multiple of sd(param)) [default 0.3]
    REJECT
    1st cycle number for rejection of outliers [default 2] The default is not to reject outliers on the first cycle when the scales may be a long way off, but if the initial scales are reasonable (particularly if they come from a previous run) it is probably better to exclude outliers from the first cycle as well
    WEIGHT VARIANCE | UNIT
    Weighting scheme for scale refinement: VARIANCE weighting is default and usual; UNIT weights may help if the scale-factors vary over a large range (unit weights have not been much tested)

    EXCLUDE [RUN <Nrun>]
    [[NO]EMAX <maximum_E> | EPROB <minimum_probability>]
    [SDMIN <value>] [SDMAX <value>] [ABSMAX <value>]
    [ARC INSIDE|OUTSIDE <X1> <Y1> <X2> <Y2> <X3> <Y3> ... <Xn> <Yn>]
    [RECTANGLE <Xmin> <Xmax> <Ymin> <Ymax>] [BATCH <batch range>|<batch list>] [CRYSTAL <crystal_name>] [DATASET <dataset_name>]

    Set intensity limits or positional limits for excluding observations.

    Limits for scaling and merging passes:-
    EMAX or EPROB, ARC, RECTANGLE, BATCH, CRYSTAL and DATASET limits apply to all stages of the program
    Limits for scaling pass only:-
    If an observation is considered too weak (I .lt. sd(I) * SDMIN), or if an observation is too strong (I .gt. sd(I) * SDMAX .or. I .gt. ABSMAX), then all observations of that reflection are omitted from the scaling. Exclusions are not applied to a Reference run. [Default EXCLUDE SDMIN 3.0]
    These exclusions do not apply to the initial scale calculation (INITIAL MEAN), nor to the output statistics, only to the scaling. The test is only done on fully recorded observations, and against the input standard deviations (i.e. unmodified by SDCORRECTION parameters)
    Subkeys:
    RUN <Nrun>
    defines a run number (previously defined) for these exclusion parameters to apply to: else applies to all runs (this applies to SD, arc and rectangle limits only: the EMAX|EPROB limit applies to all runs)
    EMAX <maximum_E> | EPROB <minimum_probability>
    Define maximum normalized amplitude E allowed: this may be given either as the maximum E-value EMAX for an acentric reflection eg 8 - 10, or as the minimum allowed probability EPROB eg 1e-8 Eprob = exp (- Emax**2). Excluded reflections are listed in the log file, and in the ROGUES file. See R.Read, CCP4 Study Weekend, Sheffield 1999. [Default EMAX 10]. NOEMAX switches this test off
    SDMIN
    minimum sd multiple for inclusion
    SDMAX
    maximum sd multiple for inclusion
    ABSMAX
    maximum absolute value i.e. observations are excluded if:-
                    I  .lt. sd(I) * SDMIN
    .or. I .gt. sd(I) * SDMAX
    .or. I .gt. ABSMAX
    ARC
    defines an area of detector coordinates (XDET, YDET) to be excluded from all calculations, both scaling and merging, as a circular arc. Data are excluded either INSIDE (lower radius) or OUTSIDE (higher radius) the arc. The arc is defined by fitting a circle to the coordinates of 3 or more points: points 1 (X1,Y1) and 2 (X2,Y2) define the ends of the arc (in either order). If X1,Y1 = X2,Y2 a complete circle is excluded. A series of arcs may be defined. This option allows for the exclusion of shadows on the detector from eg backstop or cryocooler etc.
    RECTANGLE
    defines a rectangular area of detector coordinates (XDET, YDET) to be excluded from all calculations, both scaling and merging. A series of rectangles may be defined.
    BATCH | <b1> <b2> <b3> ... | <b1> TO <b2> |
    Define a list of batches, or a range of batches, to be excluded altogether.
    CRYSTAL <crystal_name>
    Define a crystal name to be excluded altogether. This would usually be used in conjunction with the DATASET subkey.
    DATASET <dataset_name>
    Define a dataset name to be excluded altogether. A crystal name may be combined with the dataset name using the syntax <crystal_name>/<dataset_name>. The dataset names used here are those present in the input file, not those assigned or altered by the NAME command.

    [UN]TIE [SURFACE [<Sd_srf>]] [BFACTOR [<Sd_bfac>]][A1 [<Sd_a1>]][ROTATION [<Sd_z>]][DETECTOR [<Sd_xy>]]

    Apply or remove restraints to parameters. These can be pairs of neighbouring scale factors on rotation axis (ROTATION = primary beam) or in detector plane (DETECTOR = secondary beam) to have the same value, or neighbouring Bfactors, or surface spherical harmonic parameters to zero (for SECONDARY or SURFACE corrections, to keep the correction approximately spherical), with a standard deviation as given. This may be used if scales are varying too wildly, particularly in the detector plane. The default is no restraints on scales. A tie is recommended (a) if scales are varied across the detector, eg TIE DETECTOR 0.1, or (b) for SECONDARY or SURFACE corrections, eg TIE SURFACE 0.001

    UNTIE may be used to remove the default restraints on SURFACE and A1 (not recommended)

    SURFACE: tie surface parameters to spherical surface [default is TIE SURFACE 0.001]
    BFACTOR: tie Bfactors along rotation
    A1: tie TAILS parameter A1 to starting value, ie that given on the SCALES command [default is TIE A1 4]
    ROTATION: tie parameters along rotation axis (mainly useful with BATCH mode)
    DETECTOR: tie parameters on detector

    NORMALISE [SCALES|BFACTOR] [BEST|FIRST|RUN <run_number>]

    Controls which scale factors and Bfactors are "normalised", ie set to 1.0 or 0.0. The overall scale of the data is indeterminate, so one scale factor needs to be set = 1.0: similarly, one relative B-factor needs to be set = 0.0. The default options are to normalise scales on the first part of the first run, and Bfactors on the best part (ie to make all the Bfactors negative: because of the smoothing they may still go slightly positive). The normalisation of the scales is not important, but the normalisation of Bfactors is, because negative Bs will sharpen data, while positive Bs will blur it.

    SCALES
    Following keywords apply to scales
    BFACTORS
    Following keywords apply to Bfactors [default]
    BEST
    Normalise B-factors on the best bit (not applicable to scales) [default for Bfactors]
    FIRST
    Normalise on the beginning of the first run [default for scales]
    RUN <run_number>
    Normalise on the beginning of the defined run

    OUTPUT <subkeywords>

    Control what goes in the output file. Three types of output MTZ file may be produced: (a) AVERAGE, average intensity for each hkl (I+ & I-). (b) SEPARATE, observations from input file with scale calculated, for re-input to Scala (or Postref, see POSTREF option) (c) UNMERGED, unaveraged observations, but with scales applied, partials summed or scaled, and outliers rejected. AVERAGE and UNMERGED may be combined to write both types of file at the same time: in this the filename is created from the HKLOUT filename (with dataset appended if the SPLIT option is on) with the string "_unmerged" appended.

    A reference batch is always excluded from the final statistics, even if it is included in the output file (only possible with the SEPARATE option).

    File format options:
    NONE
    no output file written
    AVERAGE
    [default] output averaged intensities, <I+> & <I-> for each hkl
    SEPARATE
    output observations as input, but with added columns for SCALE etc. This file may be reinput to Scala for further scaling (e.g. with a different scaling model)
    POSTREF
    append columns for Postref. This option implies SEPARATE. The added columns are IMEAN SIGIMEAN ISUM SIGISUM IMEAN mean of fully-recorded reflections ISUM summed partials (partials only)
    UNMERGED
    apply scales, sum or scale partials, reject outliers, but do not average observations
    POLISH
    Write reflections also to a formatted file as well as the MTZ file (logical name SCALEPACK) in some obscure format as written by "scalepack" (or my best approximation to it). Why would anyone want to do this? If the UNMERGED option is also selected, then the output matches the scalepack "output nomerge original index", otherwise it is the "normal" scalepack output, with either I, sigI or I+ sigI+, I-, sigI-, depending on the "anomalous" flag.
    Dataset options (only relevant for multiple datasets):
    SPLIT
    If there are multiple datasets defined, split them into separate output files [this is the default]. The base filename is taken from the HKLOUT, with the dataset name added for each dataset.
    TOGETHER
    Write out multiple datasets into the same file, but labelled as different datasets

    Other options:

    (a) UNMERGED options:
    ORIGINAL
    write original indices hkl: M/ISYM = 1 for all reflections
    REDUCED
    [default] hkl indices are reduced to asymmetric unit, as in input file
    BEAMS
    output direction cosines of incident (s0) and diffracted (s2) beams in output file (columns S0X, S0Y, S0Z, S2X, S2Y, S2Z). These vectors are in the orthogonalised crystal frame with x,y,z axes along a*, c x a*, c (or in the diffractometer frame if the keywords DBEAMS is used)
    (b) SEPARATE (POSTREF) options
    the following apply only to the SEPARATE (POSTREF) option, and must not precede that switch:-
    REFERENCE
    write reference batch (if present) to output file
    NOREFERENCE
    [default] omit reference batch (if present) from output file
    KEEP
    [default unless average] keep reflections outside resolution limits. The SCALE column will be set = 0.0
    KEEP SCALE
    keep reflections outside resolution limits, and calculate scales for them. This is dangerous unless the proportion of reflections omitted from scaling is small
    EXCLUDE
    [default if AVERAGE] exclude reflections outside resolution limits
    OMIT OUTLIERS
    omit rejected outliers from output file (SEPARATE & POSTREF options only). In this case a ROGUES file is written (see below) [default keep them in, but flagged in the FLAG column]
    OMIT PARTIALS [RUN <Nrun>]
    omit partially recorded reflections from output file. If no run number is given, then it applies to all runs. Multiple runs may specified on successive OUTPUT OMIT PARTIALS RUN commands
    ROGUES
    write a list of rejected reflections is written to the file ROGUES. This may be assigned on the command line. A ROGUES file is always written for the AVERAGE & UNMERGED options. [for SEPARATE, default no ROGUES file written unless OMIT OUTLIERS option used]

    ACCEPT [OVERLOADS|BGRATIO <bgratio_max>|PKRATIO <pkratio_max>|GRADIENT <bg_gradient_max>|EDGE]

    Set options to accept observations flagged as rejected by the FLAG column from Mosflm (Version 6.2.3 and later). By default, any observation with FLAG .ne. 0 is rejected.

    Subkeys:
    OVERLOADS
    Accept profile-fitted overloads
    BGRATIO
    Observations are flagged in Mosflm if the ratio of rms background deviation relative to its expected value from counting statistics is too large. This option accepts observations if bgratio < bgratio_max [default in Mosflm 3.0]
    PKRATIO
    Accept observations with peak fitting rms/sd ratio pkratio < pkratio_max [default maximum in Mosflm 3.5]. Only set for fully recorded observations
    GRADIENT
    Accept observations with background gradient < bg_gradient_max [default in Mosflm 0.03].
    EDGE
    Accept profile-fitted observations on edge of active area of detector

    FINAL [ NONE | FULLS | ONLYFULLS

    | SCALE_PARTIAL <Minimum_fraction>
    | PARTIALS [[NO]CHECK] | [NO]TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction>] [[NO]GAP] [MAXWIDTH <maximum_width>] ]

    Select whether or not to use summed or scaled partials in the final analysis after scale determination. If this command is missing, summed partials will be included if the input file contains a FRACTIONCALC column.

    Subkeys:
    NONE
    no final analysis/output pass
    FULLS
    use fulls only (& previously summed partials, eg from MOSFLM ADDPART or Scalepack) [default if no FRACTIONCALC column]
    ONLYFULLS
    use fulls only: exclude previously summed partials (from MOSFLM)
    SCALE_PARTIALS
    use scaled partials greater than <Minimum_fraction> in the merging. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
    PARTIALS
    use summed partials in final analysis (if present). See introduction above for a description of the use of partially recorded reflections. [this is the default if FRACTIONCALC column is present] The following flags are qualifiers of PARTIALS and will override those given on a previous PARTIALS command, for the merging step only (not scaling):
    [NO]CHECK
    do [not] check for consistency of MPART flags (if present). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
    [NO]TEST [<lower_limit> <upper_limit>]
    do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
    CORRECT [<minimum_fraction>]
    Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
    [NO]GAP
    do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set. [default GAP]
    MAXWIDTH <maximum_width>
    maximum number of parts for an acceptable summed partial

    [UN]FIX [V] [A0] [A1]

    Option to fix or free TAILS parameters: by default V & A1 are free, A0 is fixed [default A0 = 0.0]. Fixing A1 may help for low resolution data particularly.

    LINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

    run_2 will use the same SURFACE (or SECONDARY) or TAILS parameters as run_1. This can be useful when different runs come from the same crystal, and may stabilize the parameters. LINK TAILS ALL will use the same tails parameters for all runs for which TAILS parameters are refined. The keyword ALL will be assumed if omitted.

    UNLINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

    Remove links set by LINK command (or by default). The keyword ALL will be assumed if omitted, e.g. UNLINK TAILS [ALL] will use separate tails parameters for each run.

    SKIP <N_skip> [[FOR] <N_skip_cycles>]

    Allow a subset of reflections to be used during the initial cycles of scaling, to speed up the program. For the first N_skip_cycles, only every N_skip'th unique reflection will be used. N_skip_cycles defaults = Ncycle-2, and the program will force 2 more cycles with all data if convergence is reached while reflections are still being skipped. You should check that convergence has been reached with all observations, particularly if the number of observations used in the early cycles is small.

    FILTER <Filter> [<Damp>]

    Define filter level, & damp level. In the minimization, shifts corresponding to eigenvalues .lt. <Filter> are removed, <Damp> is added to all eigenvalues. [Default 1.0e-6, 0.0]

    DAMP [NONE] | <Damp> <NcycDamp>

    Set damping level for shifts. <Damp> is added to all eigenvalues for the first <NcycDamp> cycles. This may be useful if the scales vary over a wide range, particularly if the scale refinement diverges at first, but is not normally recommended, as it seems to slow convergence. Default is DAMP NONE. If <NcycDamp> is omitted, the damping applies to all cycles

    BINS <Nsrange>

    Define number of resolution bins for analysis [default 10]

    XYBINS <Nx> [<Ny>]

    Define number of bins across detector, x (=XDET) and y (YDET). Only used if XDET, YDET columns are present in input file <Ny> defaults to <Nx>. XYBINS 0 turns off analysis [default Nx = Ny = 20]

    SMOOTHING <subkeyword> <value>

    Set smoothing factors ("variances" of weights). A larger "variance" leads to greater smoothing

    Subkeys:
    TIME <Vt>
    smoothing of B-factors [default 0.5]
    ROTATION <Vz>
    smoothing of scale along rotation [default 1.0]
    DETECTOR <Vxy>
    smoothing of scale on detector [default 1.0]
    PROB_LIMIT <DelMax_t> <DelMax_z> <DelMax_xy>
    maximum values of normalized squared deviation (del**2/V) to include a scale [default set automatically, typically 3.0]

    INSCALE OFF | ON

    Switch OFF or ON application of an input SCALE column. By default, if the input file contains a column called SCALE (e.g. from a previous run of Scala), it will be applied.

    NOSCALE

    Don't do any scaling, just the final analysis (equivalent to CYCLES 0)

    DUMP [<Scale_file_name>]

    Dump all scale factors to a file after each cycle. These can be used to restart scaling using the RESTORE option, or for rerunning the merge step. If no filename is given, the scales will be written to logical file SCALES, which may be assigned on the command line. DUMP is set by default, but may be turned off with the NODUMP command.

    NODUMP

    No dump of scales to file. Default is DUMP.

    ANALYSE [[NO]NORMAL] [[NO]PLOT] [MAXDENSITY <maximum point density>]

    This command controls the normal probability analyses

    Subkeys:
    [NO]NORMAL
    do [not] do normal probability analyses [default do them]
    [NO]PLOT
    do [not] write normal probability plot to output file with logical name DELTA [default do write file]. This file contains pairs of delta(expected), delta(observed) for fulls, then summed partials, then scaled partials
    MAXDENSITY
    maximum point density for normal probability plot. This plot includes a point for every observation, so in large datasets it can get very big. This parameter allows the sampling of the plot, so that in the central crowded part only some of the points are included in the plotfile [default 25]

    HISTORY <history line>

    Define optional line to be added to the history records in the file. This is in addition to a line giving the date and time of the run, which is always added. Only one optional history line may be added.

    OVERLAPMAP

    Write the overlap matrix from the initial analysis to a map file assigned to MAPOUT. Note that the initial analysis is not done if the RESTORE option is used or INITIAL NONE is set.

    WIDTH WILSON | LINEAR | SQUARE [NBINS <Nbins>] [<mid-point>]

    Select binning mode on intensity

    Subkeys:
    WILSON
    [default] exponential bins
    LINEAR
    linear bins
    SQUARE
    quadratic bins

    In each case, <mid-point> is the upper limit for the middle bin. The NBINS keyword may be used to specify the number of bins [maximum & default = 13]

    NAME [RUN <RunNumber(s)>] PROJECT <project_name> CRYSTAL <crystal_name> DATASET <dataset_name>

    Assign or reassign project/crystal/dataset names, for output file. The names given here supersede those in the input file: each NAME command defines an output dataset.

    If the RUN subkey is present, different runs (or groups of runs) may be assigned to different datasets: the run must have already been defined. If the RUN subkey is omitted, the names apply to all data. RunNumber may be a list or a range of run numbers (see examples below). DATASET must be present and must be unique: if PROJECT or CRYSTAL are omitted, they take the value last given for these parameters. DATASET may optionally be given in the syntax crystal_name/dataset_name

    Examples:

    name run 1      project  Lysozyme crystal  Native dataset L1
    name run 2 3 dataset L2 # takes project & crystal from previous line
    name run 4 to 6 crystal Native dataset L3

    BASE [CRYSTAL <crystal_name>] DATASET <base_dataset_name>

    If there are multiple datasets in the input file, define the "base" dataset for analysis of dispersive (isomorphous) differences. Differences between other datasets and the base dataset are analysed for correlation and ratios, ie for the i'th dataset (I(i) - I(base)). By default, the datasets with the shortest wavelength will be chosen as the base (or dataset 1 if wavelength is unknown). Typically, the CRYSTAL keyword may be omitted.

    PRIVATE

    Set the directory permissions to '700', i.e. read/write/execute for the user only (default '755').

    USECWD

    Write the deposit file to the current directory, rather than a subdirectory of $HARVESTHOME. This can be used to send deposit files from speculative runs to the local directory rather than the official project directory, or can be used when the program is being run on a machine without access to the directory $HARVESTHOME.

    RSIZE <row_length>

    Maximum width of a row in the deposit file (default 80). <row_length> should be between 80 and 132 characters.

    NOHARVEST

    Do not write out a deposit file; default is to do so provided Project and Dataset names are available.

    INPUT AND OUTPUT FILES

    Input

    HKLIN
    The input file must be sorted on H K L M/ISYM BATCH

    Compulsory columns:

            H K L           indices
    M/ISYM partial flag, symmetry number
    BATCH batch number
    I intensity (integrated intensity)
    SIGI sd(intensity) (integrated intensity)

    Optional columns:

            XDET YDET       position on detector of this reflection: these
    may be in any units (e.g. mm or pixels), but the
    range of values must be specified in the
    orientation data block for each batch. If
    these columns are absent, the scale may not be
    varied across the detector (i.e. only SCALES
    DETECTOR 1 is valid)
    ROT rotation angle of this reflection ("Phi"). If
    this column is absent, only SCALES BATCH is valid.
    IPR intensity (profile-fitted intensity)
    SIGIPR sd(intensity) (profile-fitted intensity)
    SCALE previously calculated scale factor (e.g. from
    previous run of Scala). This will be applied
    on input
    SIGSCALE sd(SCALE)
    TIME time for B-factor variation (if this is
    missing, ROT is used instead)
    MPART partial flag from Mosflm
    FRACTIONCALC calculated fraction, required to SCALE PARTIALS
    LP Lorentz/polarization correction (already applied)
    FLAG error flag (packed bits) from Mosflm (v6.2.3
    or later). By default, if this column is present,
    observations with a non-zero FLAG will be
    omitted. They may be conditionally accepted
    using the ACCEPT command (qv)
    Bit flags:
    1 BGRATIO too large
    2 PKRATIO too large
    4 Negative > 5*sigma
    8 BG Gradient too high
    16 Profile fitted overload
    32 Profile fitted "edge" reflection
    BGPKRATIOS packed background & peak ratios, & background
    gradient, from Mosflm, to go with FLAG
    CORNERCORRECT
             File containing pixel corrections for the corner correction option (qv), as an ADSC image file, for groups of
    8x8 pixels for a binned image. Note this correction is unique to an individual detector, and Scala is unable to check whether the appropriate file has been given

    Output

    HKLOUT
    (a) Option AVERAGE
    The output file contains columns
    H K L  IMEAN SIGIMEAN  I(+) SIGI(+)  I(-) SIGI(-)

    Note that there are no M/ISYM or BATCH columns. I(+) & I(-) are the means of the Bijvoet positive and negative reflections respectively and are always present even for the option ANOMALOUS OFF.

    If the "TOGETHER" option is selected, then all datasets will be written to the same file, with the column labels augmented by the dataset name

    	IMEAN_dataset SIGIMEAN_dataset  I(+)_dataset SIGI(+)_dataset  I(-)_dataset SIGI(-)_dataset
    If the "SPLIT" option  is specified then separate files are written for each dataset:  files are named with the base HKLOUT name with the dataset name appended, as "_dataset"

    (b) Option SEPARATE
    The output file contains the same columns as the input, with some columns added if not previously present:-

    SCALE & SIGSCALE - the calculated scale factor & its sd (this may be applied in another run of Scala). SCALE will be = 0.0 for reflections outside the resolution cutoff, if they are included in the output file (option OUTPUT KEEP) (see example)

    SIGIC [, SIGIPRC] - the corrected standard deviations of I [and IPR], as altered by SDCORR commands. These columns are only written if a SDCORRECTION command is given to Scala.

    If the OUTPUT POSTREF option is given, then also the columns IMEAN SIGIMEAN ISUM SIGISUM are added

            IMEAN    mean of fully-recorded reflections
    ISUM summed partials (partials only)

    (c) Option UNMERGED
    As for SEPARATE, but with scales applied, with no partials (i.e. partials have been summed or scaled, unmatched partials removed), & outliers rejected. If a separate profile-fitted intensity column IPR, SIGIPR is present in the input file as well as columns I, SIGI, only one set will be chosen, as specified. Columns defining the diffraction geometry (e.g. XDET YDET ROT TIME LP FRACTIONCALC) will be preserved in the output file. If both AVERAGE & UNMERGED are specified, then the filename for the unmerged file has "_unmerged" appended

    Output columns:

            H,K,L     REDUCED or ORIGINAL indices (see OUTPUT options)
    M/ISYM Symmetry number (REDUCED), = 1 for ORIGINAL indices
    BATCH batch number as for input
    I, SIGI scaled intensity & sd(I)
    SCALEUSED scale factor applied
    SIGSCALEUSED sd(SCALE applied)
    NPART number of parts, = 1 for fulls, negated for scaled
    partials, i.e. = -1 for scaled single part partial
    TIME copied from input if present
    XDET,YDET copied from input if present
    ROT copied from input if present (averaged for
    multi-part partials)
    FRACTIONCALC total fraction (if present in input file)
    LP copied from input if present
    If BEAM option is used:-
    S0X, S0Y, S0Z direction cosines of incident beam in
    orthogonalised crystal frame ( x,y,z axes along
    a*, c x a*, c)
    S2X, S2Y, S2Z direction cosines of diffracted beam in
    orthogonalised crystal frame
    SCALES
    scale factors from DUMP, used by RESTORE option
    ROGUES
    list of bad agreements
    PLOT
    If SCALES SECONDARY or SURFACE options are used, graph of correction surface (Plot84 format)
    NORMPLOT
    normal probability plot from merge stage
    *** this is at present written is a format for plotting program xmgr (aka grace) ***
    ANOMPLOT
    normal probability plot of anomalous differences
                (I+ - I-)/sqrt[sd(I+)**2 + sd(I-)**2]
    *** this is at present written is a format for plotting program xmgr (aka grace) ***
    CORRELPLOT
    scatter plot of pairs of anomalous differences (in multiples of RMS) from random half-datasets. One of these files is generated for each output dataset
    *** this is at present written is a format for plotting program xmgr (aka grace) ***
    ROGUEPLOT
    a plot of the position on the detector (on an ideal virtual detector with the rotation axis horizontal) of rejected outliers, with the position of the principle ice rings shown
    *** this is at present written is a format for plotting program xmgr (aka grace) *** 
    SCALEPACK
    Formatted output selected by the command OUTPUT POLISH

    EXAMPLES

    1. Simple smoothed scaling, with some alternatives flagged as #*#
    2. set crystal = "tfn2"
      scala hklin ${crystal}_srs \
      hklout ${crystal}_merge \
      scales ${crystal}_${run}.scales \
      rogues ${crystal}_${run}.rogues \
      normplot ${crystal}_${run}.norm \
      << eof

      run 1 all

      intensities partial # we have few fulls: this is the default

      cycles 20

      anomalous off # this is a native set
      #*# anomalous on # or a derivative

      sdcorrection 1.3 0.02 # from a previous run

      # try it with and without the tails correction: this is with tails
      scales rotation spacing 10 bfactor on tails
      #*#
      #*# Some alternatives
      #*# >> Recommended usual case
      #*# >> If you have radiation damage, you need a Bfactor,
      #*# >> but a Bfactor at coarser intervals is more stable
      #*# scales rotation spacing 5 secondary 6 \
      #*# bfactor on brotation spacing 20
      #*# tie bfactor 0.5 ## restraining the Bfactor also sometimes helps
      #*#


      reject 4 # reject outliers more than 4sd from mean
      #*# reject 6 all 8 is default

      exclude emax 8 # reject very large observations
      # default is Emax 10

      eof
    3. Simple Batch scaling
    4. #!/bin/csh -f
      #
      # Scale data from Mosflm, merge with Scala
      #
      scala hklin jpa_example hklout jpa_example_sc \
      scales jpa.scales \
      rogues jpa.rogues \
      normplot jpa.norm \
      anomplot jpa.anom \
      << eof-1
      run 1 batch 2001 to 2049
      run 2 batch 2051 to 2100
      cycles 8
      sdcorr 1.5 0.03
      scales batch bfactor on # batch scaling is generally poorer than smoothed
      reject merge 4
      anomalous on
      eof-1
    5. A more complicated example, smooth scaling of native, then scaling of derivative to native
    6. #!/bin/csh -f
      #
      #scala
      #
      cd /scr0/fm1/Temp
      #
      ##
      #==== Sort native output from Mosflm together
      ##
      sort:
      sortmtz hklout m6c8_sort.mtz << end_sort
      H K L M/ISYM BATCH I SIGI
      m6c8a1.mtz
      m6c8a2.mtz
      end_sort
      #
      ##
      #==== scale native data together, no Bfactor, smooth scale on rotation
      #==== merge native
      ##
      scala hklin m6c8_sort.mtz hklout m6c8_scala <<EOF
      run 1 batch 1 to 90000
      title frozen native monoclinic m6c8
      scales bfactor off rotation spacing 5
      resolution 25 6.1
      anomalous off
      reject merge 4
      sdcorr 1.3 0.04
      EOF
      #
      # Convert native data into form suitable for reinput to Scala
      combat hklin m6c8_scala hklout m6c8_r << eof-r
      input mtzi
      labin I=IMEAN SIGI=SIGIMEAN
      batch 1
      eof-r
      #
      ##
      #==== Sort derivative data together
      ##
      sort:
      sortmtz hklout m6cb3_sort.mtz << end_sort
      H K L M/ISYM BATCH I SIGI
      m6cb3b.mtz
      m6cb3c.mtz
      end_sort
      #
      ##
      #==== Combine together merged native & sorted derivative data, by
      # interleaving reflection records
      # Must resort data after this step
      ##
      mtzutils:
      mtzutils hklin2 m6cb3_sort.mtz \
      hklin1 m6c8_r \
      hklout temp_m6cb3_resort << eof-m
      merge
      eof-m
      #
      sortmtz hklin temp_m6cb3_resort hklout m6cb3_resort << eof-m
      H K L M/ISYM BATCH
      eof-m
      #
      ##
      #==== Scale and merge derivative data, using native data as reference (run 1)
      # Use secondary beam absorption correction for derivative,
      # but with some restraints (tie)
      # The reference data (native) is omitted from the output file
      ##
      scala hklin m6cb3_resort.mtz hklout m6cb3_scala \
      scales m6cb3.scales \
      rogues m6cb3.rogues \
      normplot m6cb3.norm \
      anomplot m6cb3.anom \
      plot m6cb3.plt \
      <<EOF
      run 1 batch 1 reference
      run 2 batches 10 to 23156 exclude 23152 # reject one duff batch
      run 3 batches 23157 to 90000
      title frozen native monoclinic m6cb3
      scales bfactor off rotation spacing 5 secondary 6
      tie surface 0.001 # this is the default value anyway
      resolution 25 2.5
      reject merge 4
      anomalous on
      sdcorr 1.1 0.005
      EOF
      #
      #
      #
      #exit
      trunc:
      truncate hklin m6cb3_scala \
      hklout /ss3/fm1/Mutase/Derivs_FzM/m6cb3_F <<end-trunc
      anomalous yes
      resolution 25 2.5
      nresidue 1400
      labout F=FM623 SIGF=SIGFM623 DANO=DANOM623 SIGDANO=SIGDANOM623
      end-trunc
    7. Scaling of several MAD datasets together, no reference dataset
    8. #!/bin/csh -f

      # Define a base name for files created in this script
      set name = dfxe_3d
      set project = dfxe
      set crystal = crys1

      # Input filenames for the 4 datasets at different wavelengths
      set l1 = dfxe_1 # peak
      set l2 = dfxe_2 # inflection
      set l3 = dfxe_3 # hard remote
      set l4 = dfxe_4 # 1A wavelength

      set nl1 = peak
      set nl2 = inflect
      set nl3 = highE
      set nl4 = lowE

      # Angular spacing for smoothed scales
      set spacing = 5

      # Sort together the initial data files
      sortmtz hklout ${name}_all << eof-s
      H K L M/ISYM BATCH
      ${l1}.mtz
      ${l2}.mtz
      ${l3}.mtz
      ${l4}.mtz
      eof-s


      ###=== Step 1 ==========================================================
      ###=== Scale all datasets together
      ###=== This will write out 4 output files, with filenames constructed
      ###=== by appending the dataset name on to the hklout name
      scale_1:
      set run = all
      scala hklin ${name}_all hklout ${name} \
      scales ${run}.scales \
      normplot ${run}.norm \
      anomplot ${run}.anom \
      rogues ${run}.rogues \
      << eof-r1
      title Scale all datasets together, smooth, secondary
      # Define runs
      run 1 batch 1000 to 1999
      run 2 batch 2000 to 2999
      run 3 batch 3000 to 3999
      run 4 batch 4000 to 4999

      # Define datasets: this should have been done in Mosflm previously
      name run 1 project ${project} crystal ${crystal} dataset ${nl1} # peak
      name run 2 project ${project} crystal ${crystal} dataset ${nl2} # inflection
      name run 3 project ${project} crystal ${crystal} dataset ${nl3} # highE
      name run 4 project ${project} crystal ${crystal} dataset ${nl4} # lowE

      # Dispersive differences for analysis are relative to the "base" dataset
      base dataset highE


      # If using secondary beam correction, usually turn Bfactor off
      # unless you have high resolution and radiation damage

      scales rotation spacing ${spacing} bfactor off secondary 6
      tie surface 0.001 # this is the default restraint to keep the
      # absorption surface spherical

      anomalous on

      # reject on 5 sigma within the I+ or I- sets, 8 sigma between I+ & I-
      reject 5 all 8

      eof-r1

      eof-${run}

      ###=== Convert I to F, do Wilson plot, for each dataset
      ###=== A future change to Truncate may allow processing of multiple
      ###=== datasets together
      l1:
      set ln = ${nl1}
      truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
      DANO=DANO${ln} SIGDANO=SIGDANO${ln} ISYM=ISYM${ln} \
      F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+) F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${ln}

      l2:
      set ln = ${nl2}
      truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
      DANO=DANO${ln} SIGDANO=SIGDANO${ln} ISYM=ISYM${ln} \
      F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+) F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${ln}

      l3:
      set ln = ${nl3}
      truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
      DANO=DANO${ln} SIGDANO=SIGDANO${ln} ISYM=ISYM${ln} \
      F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+) F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${ln}

      l4:
      set ln = ${nl4}
      truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
      DANO=DANO${ln} SIGDANO=SIGDANO${ln} ISYM=ISYM${ln} \
      F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+) F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${ln}

      ###=== Sort together merged data for all wavelength, outputting a
      ###=== single record for each hkl
      ###=== For each wavelength, store amplitude F & sigF,
      ###=== anomalous difference DANO (= F+ - F-) & sigDANO,
      ###=== and ISYM flag which shows if both F+ & F- were measured
      cad hklout ${name}_fcad \
      hklin1 ${name}_${nl1}_f \
      hklin2 ${name}_${nl2}_f \
      hklin3 ${name}_${nl3}_f \
      hklin4 ${name}_${nl4}_f << eof-c
      labin file_number 1 \
      E1=F${nl1} E2=SIGF${nl1} E3=DANO${nl1} E4=SIGDANO${nl1} E5=ISYM${nl1} \
      E6=F${nl1}(+) E7=SIGF${nl1}(+) E8=F${nl1}(-) E9=SIGF${nl1}(-)
      labin file_number 2 \
      E1=F${nl2} E2=SIGF${nl2} E3=DANO${nl2} E4=SIGDANO${nl2} E5=ISYM${nl2} \
      E6=F${nl2}(+) E7=SIGF${nl2}(+) E8=F${nl2}(-) E9=SIGF${nl2}(-)
      labin file_number 3 \
      E1=F${nl3} E2=SIGF${nl3} E3=DANO${nl3} E4=SIGDANO${nl3} E5=ISYM${nl3} \
      E6=F${nl3}(+) E7=SIGF${nl3}(+) E8=F${nl3}(-) E9=SIGF${nl3}(-)
      labin file_number 4 \
      E1=F${nl4} E2=SIGF${nl4} E3=DANO${nl4} E4=SIGDANO${nl4} E5=ISYM${nl4} \
      E6=F${nl4}(+) E7=SIGF${nl4}(+) E8=F${nl4}(-) E9=SIGF${nl4}(-)
      eof-c

    REFERENCES

      1. P.R.Evans, "Scaling and assessment  of data quality", Acta Cryst. D62, 72-82  (2006). Note that definitions of Rmeas and Rpim in this paper are missing a square-root on the (1/n-1) factor
      2. W. Kabsch, J.Appl.Cryst. 21, 916-924 (1988)
      3. P.R.Evans, "Data reduction", Proceedings of CCP4 Study Weekend, 1993, on Data Collection & Processing, pages 114-122
      4. P.R.Evans, "Scaling of MAD Data", Proceedings of CCP4 Study Weekend, 1997, on Recent Advances in Phasing, Click here
      5. R.Read, "Outlier rejection", Proceedings of CCP4 Study Weekend, 1999, on Data Collection & Processing
      6. Hamilton, Rollett & Sparks, Acta Cryst. 18, 129-130 (1965)
      7. Blessing, R.H., Acta Cryst. A51, 33-38 (1995)
      8. Kay Diederichs & P. Andrew Karplus, "Improved R-factors for diffraction data analysis in macromolecular crystallography", Nature Structural Biology, 4, 269-275 (1997)
      9. Manfred Weiss & Rolf Hilgenfeld, "On the use of the merging R factor as a quality indicator for X-ray data", J.Appl.Cryst. 30, 203-205 (1997)
      10. Manfred Weiss, "Global Indicators of X-ray data quality" J.Appl.Cryst. 34, 130-135 (2001)

    Appendix 1: Partially recorded reflections

    Partially recorded reflections are usually used in scaling (controlled by the command INTENSITIES), and in the final analysis (controlled by the command FINAL). The default is to include summed partials in both scaling and the final analysis and merging.

    Different options for the treatment of partials are set for both scaling & merging stages by the PARTIALS command, or separately for the scaling stage (INTENSITIES command) and the merging stage (FINAL command). Partials may either be summed (subkeyword PARTIALS, with various options), or scaled (subkeyword SCALE_PARTIALS): in the latter case, each part is treated independently of the others. If summed partials are used in scaling with the SCALES BATCH option, the FRACTIONCALC is used to partition the effects of the different scales for the two halves. In the input file, partials are flagged with M=1 in the M/ISYM column, and have a calculated fraction in the FRACTIONCALC column. Data from Mosflm also has a column MPART which enumerates each part (e.g. for a reflection predicted to run over 3 images, the 3 parts are labelled 31, 32, 33), allowing a check that all parts have been found: MPART = 10 for partials already summed in MOSFLM.

    For datasets with few partials, with low mosaicity compared to the image widths, very few partials run over more than two images, & partial summation is not usually a problem. If you have many partials running over 3 or more images, you may need to tune the partial selection flags below to accept or reject partial sets according to their reliability.

    Summed partials:
    All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The options to use partials as well as fulls are defined separately for the scaling and merging steps on the INTENSITIES and FINAL commands. The parameters for the checks are set by the PARTIALS command for both stages, or separately on the INTENSITIES and FINAL commands. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

    (a)
    At least two parts must be present (unless the CORRECT option is set, see (e) below)
    (b)
    not more than MAXWIDTH <maximum_width> parts must be present [default maximum_width = 5]
    (c)
    if the CHECK option is set (the default if an MPART column is present), the MPART flags are examined. If they are consistent, the summed intensity is accepted. If they are inconsistent (quite common), the total fraction is checked unless NOTEST is specified, in which case they are rejected. NOCHECK switches off this check.
    (d)
    if the TEST option is set (default if no MPART column), the summed reflection is accepted if the total fraction (the sum of the FRACTIONCALC values) lies between <lower_limit> -> <upper_limit> [default limits = 0.95 1.2]
    (e)
    if the CORRECT option is set, the total intensity is scaled by the inverse total fraction for total fractions between <minimum_fraction> to <lower_limit>. This works also for a single unmatched partial. As for the scaled partial option, this correction relies on accurate FRACTIONCALC values, so beware.
    (f)
    if the GAP option is set, partials with a gap in are accepted, e.g. a partial over 3 parts with the middle one missing. The GAP option implies TEST & NOCHECK, & the CORRECT option may also be set.

    By setting the TEST & CORRECT limits, you can control summation & scaling of partials, e.g .

          TEST 1.2 1.2 CORRECT 0.5 

    will scale up all partials with a total fraction between 0.5 & 1.2

          TEST 0.95 1.05           

    will accept summed partials 0.95->1.05, no scaling

          TEST 0.95 1.05 CORRECT 0.4  

    will accept summed partials 0.95->1.05, and scale up those with fractions between 0.4 & 0.95

    Note that a profile-fitted intensity, if present in the file as a separate IPR column, will not be used for a scaled partial, unless the PARTIALS USE_PROFILE flag is set.

    Scaled partials:
    In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5].


    Appendix 2: Scaling algorithm

    For each reflection h, we have a number of observations Ihl, with estimated standard deviation shl, which defines a weight whl. We need to determine the inverse scale factor ghl to put each observation on a common scale (as Ihl/ghl). This is done by minimizing

     
    Sum( whl * ( Ihl - ghl * Ih )**2 ) Ref Hamilton, Rollett & Sparks

    where Ih is the current best estimate of the "true" intensity

            Ih = Sum ( whl * ghl * Ihl ) / Sum ( whl * ghl**2)

    Each observation is assigned to a "run", which corresponds to a set of scale factors. A run would typically consist of a continuous rotation of a crystal about a single axis.

    The inverse scale factor ghl is derived as follows:

            ghl = Thl * Chl * Shl

    where Thl is an optional relative B-factor contribution, Chl is a scale factor (1-dimensional or 3-dimensional (ie DETECTOR option)), and Shl is a anisotropic correction expressed as spherical harmonics (ie SECONDARY, ABSORPTION or SURFACE options).

    a) B-factor (optional)

    For each run, a relative B-factor (Bi) is determined at intervals in "time" ("time" is normally defined as rotation angle if no independent time value is available), at positions ti (t1, t2, . . tn). Then for an observation measured at time tl

            B = Sum[i=1,n] ( p(delt) Bi ) / Sum (p(delt))

    where Bi are the B-factors at time ti
    delt = tl - ti
    p(delt) = exp ( - (delt)**2 / Vt )
    Vt is "variance" of weight, & controls the smoothness
    of interpolation

    Thl = exp ( + 2 s B )
    s = (sin theta / lambda)**2

    An alternative anisotropic B-factor may be used to correct for anisotropic fall-off of scattering: THIS OPTION IS NOT RECOMMENDED. This is parameterized on the components of the scattering vector (divided by 2 for compatibility with the normal definition of B) in two directions perpendicular to the Xray beam (y & z in the "Cambridge" coordinate frame with x along the beam).

            Thl = exp ( + 2[uy**2 Byy + 2 uy uz Byz + uz**2 Bzz])

    where uy, uz are the components of d*/2

    Byy, Byz, Bzz are functions of time ti or batch as for the isotropic Bfactor. The principal components of B (Bfac_min, Bfac_max) are also printed.

    b) Scale factors

    For each run, scale factors Cxyz are determined at positions (x,y) on the detector, at intervals on rotation angle z. Then for an observation at position (x0, y0, z0),

            Chl(x0, y0, z0) =
    Sum(z)[p(delz){Sum(xy)[q(delxy)*Cxyz]/Sum(xy)[q(delxy)]}/Sum(z)[p(delz)]

    where delz = z - z0
    p(delz) = exp(-delz**2/Vz)
    q(delxy)= exp(-((x-x0)**2 + (y-y0)**2)/Vxy)
    Vz, Vxy are the "variances" of the weight & control the smoothness
    of interpolation

    For the SCALES BATCH option, the scale along z is discontinuous: the normal option has one scale factor (or set of scale factors across the detector) for each batch. The SLOPE (not recommended) option has two scale factors per batch, with the scale interpolated linearly between the beginning and end according to the rotation angle of the reflection.

    c) Anisotropy factor

    The optional surface or anisotropy factor Shl is expressed as a sum of spherical harmonic terms as a function of the direction of
    (1) the secondary beam (SECONDARY correction) in the camera spindle frame,
    (2) the secondary beam (ABSORPTION correction) in the crystal frame, permuted to put either a*, b* or c* along the spherical polar axis
    or
    (3) the scattering vector in the crystal frame (SURFACE option).

    1. SECONDARY beam direction (camera frame)
               s  =  [Phi] [UB] h
      s2 = s - s0
      s2' = [-Phi] s2
      Polar coordinates:
      s2' = (x y z)
      PolarTheta = arctan(sqrt(x**2 + y**2)/z)
      PolarPhi = arctan(y/x)

      where [Phi] is the spindle rotation matrix
      [-Phi] is its inverse
      [UB] is the setting matrix
      h = (h k l)
    2. ABSORPTION: Secondary beam direction (permuted crystal frame)
               s    = [Phi] [UB] h
      s2 = s - s0
      s2c' = [-Q] [-U] [-Phi] s2
      Polar coordinates:
      s2' = (x y z)
      PolarTheta = arctan(sqrt(x**2 + y**2)/z)
      PolarPhi = arctan(y/x)

      where [Phi] is the spindle rotation matrix
      [-Phi] is its inverse
      [Q] is a permutation matrix to put
      h, k, or l along z (see POLE option)
      [U] is the orientation matrix
      [B] is the orthogonalization matrix
      h = (h k l)
    3. Scattering vector in crystal frame
      	(x y z) = [Q][B] h
      Polar coordinates:
      PolarTheta = arctan(sqrt(x**2 + y**2)/z)
      PolarPhi = arctan(y/x)

      where [Q] is a permutation matrix to put
      h, k, or l along z (see POLE option)
      [B] is the orthogonalization matrix
      h = (h k l)
    then
     Shl = 1  +  Sum[l=1,lmax] Sum[m=-l,+l] Clm  Ylm(PolarTheta,PolarPhi)

    where Ylm is the spherical harmonic function for
    the direction given by the polar angles
    Clm are the coefficients determined by
    the program

    Notes:

    Appendix 3: TAILS correction

    For many crystals, the reflection profile on rotation ("phi") is not a simple closed curve, but has long tails due at least in part to thermal diffuse scattering (TDS): the amount of this depends on the crystal, and is larger at high resolution than at low resolution. If all reflections were scanned through the same angle, then equal amounts of this diffuse scattering would be included in each reflection. However, in typical "coarse sliced" data collection schemes, where the image rotation width is larger than the reflection width, reflections are recorded on a variable number of images, 1, 2, 3 etc, and different amounts of the tails are included in the integrated intensity. This generally leads to a negative "partial bias", increasing with resolution, i.e. the apparent intensities of partially recorded reflections are higher than equivalent fulls.

    The TAILS correction is an attempt to correct for the different truncation of tails, by using a simple (crude) model of thermal diffuse scattering, although the correction only attempts to correct for the different truncation, and does not attempt to correct for diffuse scattering itself.

    Some of the ideas used are based on suggestions by R.H.Blessing, Cryst. Reviews, 1, 3-58 (1987), but he should not be blamed for this.

    This is a brief account of method (see code & comments in subroutine dffscn for more details):-

    1. I = J ( 1 + alpha)
      where J is the Bragg intensity (true intensity) & I is the measured intensity, i.e. the TDS intensity is proportional to the Bragg intensity
    2. alpha = alpha0 + alpha1 * (sin theta / lambda)**2
      where alpha0 & alpha1 are refinable parameters. This is a simple linear isotropic model to the amount of TDS. alpha0 should be 0.0, and may be fixed as such, but allowing it to vary seems to help sometimes. Both alpha0 & alpha1 are reset if they go negative in the refinement. An extension of the model would be to make alpha anisotropic.
    3. each reflection is scanned over an angle DPhi, which is an integral multiple of the image width (Dphi = Nimages * DelPhi). A rotation by DPhi moves the reflection a distance in reciprocal space
    4.         Dq = Dphi * xsi,    

      where xsi is the radius from the rotation axis

      If the half width of the reflection (including tails) is v (another refineable parameter), and 2v > Dq, then part of the tails will be truncated.

      Taking a simple model of the shape of the tails as a triangle of base width 2v, height in the middle h (h = J * alpha / v), then the area in the tails (= tail intensity) and the intensity truncated by the restricted scan range can be calculated. Then the corrected ("true") intensity J can be calculated

      For full scan:

              J = I / (1 + alpha)

      For truncated scan (missing parts of tails C1 & C2)

              J = I / (1 + alpha*(1 - C1 - C2))
    5. because this model is very crude, it seems insufficiently trustworthy to use as a proper correction for TDS. It does however seem reasonable to correct for the different amounts of tails truncation, C1 & C2 ( >= 0.0)
    6. The correction applied is thus

              I' = I * (1 + alpha) / (1 + alpha*(1 - C1 - C2))
    7. the parameters refined are v, alpha0 (A0) and alpha1 (A1). By default, the same parameters are used for all runs (see LINK, UNLINK). refinement of the parameters seems often to be unstable. If they are being reset from negative values, try setting A0 = 0.0 (e.g. SCALES . . TAILS 0.005 0.0 30.0) and fixing A0 (FIX A0, this is the default)

    Appendix 4: Data from Denzo

    DENZO is often run refining the cell and orientation angles for each image independently, then postrefinement is done in Scalepack. It is essential that you do this postrefinement. Either then reintegrate the images with the cell parameters fixed, or use unmerged output from scalepack as input to Scala. The DENZO or SCALEPACK outputs will need to be converted to a multi-record MTZ file using COMBAT (see COMBAT documentation) or POINTLESS (for Scalepack output only).

    Both of these options have some problems


    Appendix 5: Outlier algorithm

    The test for outliers is as follows:

    (1)
    if there are 2 observations (left), then
    (a)
    for each observation Ihl, test deviation
         Delta(hl) =  (Ihl - ghl Iother) / sqrt[sigIhl**2 + (ghl*sdIother)**2]

    against sdrej2, where Iother = the other observation

    (b)
    if either |Delta(hl)| > sdrej2, then
    1. in scaling, reject reflection. Or:
    2. in merging,
      1. keep both (default or if KEEP subkey given) or
      2. reject both (subkey REJECT) or
      3. reject larger (subkey LARGER) or
      4. reject smaller (subkey SMALLER).
    (2)
    if there 3 or more observations left, then
    (a)
    for each observation Ihl,
    1. calculate weighted mean of all other observations <I>n-1 & its sd(<I>n-1)
    2. deviation
    3.           Delta(hl) =
      (Ihl - ghl <I>n-1>) / sqrt[sigIhl**2 + (ghl*sd(<I>n-1))**2]
    4. find largest deviation max|Delta(hl)|
    5. count number of observations for which Delta(hl) .ge. 0 (ngt), & for which Delta(hl) .lt. 0 (nlt)
    (b)
    if max|Delta(hl)| > sdrej, then reject one observation, but which one?
    1. if ngt == 1 .or. nlt == 1, then one observation is a long way from the others, and this one is rejected
    2. else reject the one with the worst deviation max|Delta(hl)|
    (3)
    iterate from beginning

    RELEASE NOTES

    Version 3.3.21

    Version 3.3.18

    Version 3.3.17

    Version 3.3.15, 16

    Version 3.3.14

    Version 3.3.8, 9, 10

    Version 3.3.4, 5

    Version 3.3.1,2,3

    Version 3.3.0

    Version 3.2.33

    Version 3.2.31

    Version 3.2.28

    Version 3.2.22

    Version 3.2.21

    Version 3.2.20

    Version 3.2.18

    Version 3.2.17

    Version 3.2.16

    Version 3.2.15

    Version 3.2.13

    Version 3.2.10

    Version 3.2.8-9

    Version 3.2.0-3

    Version 3.1.20

    Version 3.1.19

    Version 3.1.18

    Version 3.1.15

    Version 3.1.12

    Version 3.1.6-11

    Version 3.1.5

    Version 3.0.N, 3.1.2-4

    Version 2.7.6

    Version 2.7.5

    Version 2.7.4

    Version 2.7.3

    Version 2.7.2

    Version 2.7.1

    Version 2.6.4

    Version 2.6.3

    Version 2.6.2

    Version 2.6.1

    Version 2.6.0

    Version 2.5.5

    Version 2.5.4

    Version 2.5.3

    Version 2.5.2

    Version 2.5.1

    Version 2.5.0

    Version 2.4.3

    Version 2.4.2

    Version 2.4.1

    Version 2.3.2

  • Out of Phi range is warning, not fatal
  • Check for M>0 (flag set in Postref) for partials: previously didn't work with data from Postref
  • Correct labels for UNMERGED output option
  • DAMP keyword added
  • Bug fix to avoid normal probability analysis problem is no fulls
  • Version 2.3.1

  • Output labels for SEPARATE option changed to conform with CCP4 3.3 convention, i.e. I(+) and I(-) etc
  • Version 2.3.0

  • added "anomalous match" options for selecting matched I+ & I-
  • EXCLUDE does not check reference batch
  • Version 2.2.3

    1. fixed bug in summed partials in case of "scales batch": this combination is still dubious, but awaits proper analysis
    2. added PARTIALS keyword
    3. fixed bug in calculation of Rfull: this was completely wrong if anomalous data was present
    4. added INTENSITIES ANOMALOUS option to keep I+ & I- separate in scaling (not normally recommended)
    5. allow incomplete orientation data in certain cases

    Version 2.2.2, November 1996

    1. defaults on partial summation improved (and again 18/12/96)
    2. analysis on fulls only even when partials are used
    3. bug fix in random number routine (thanks to Adam)
    4. ONLYMERGE option
    5. If scaling across detector (e.g. "scales detector 3 3"), checks on valid Xdet, Ydet (within limits in file header)
    6. Rogues file lists Xdet, Ydet, Phi
    7. default in scaling is "exclude sdmin 6" (omitting weak observations speeds scaling)
    8. default FIX A0
    9. reject outliers on every cycle if scales "restored" (else previous scaling gets messed up)
    10. analysis by position on detector
    11. fixed bug affecting "reject byrun" & deviations with anomalous on

    Version 2.2.1, November 1996

    Many changes from version 1.x.x

    1. this version by default merges multiple measurements and thus replaces Agrovata. See the keyword OUTPUT for further description of the output options:-
    2. -
      AVERAGE [default] merged I (as from Agrovata)
      SEPARATE separate scaled measurements (as from older Scala versions), for reinput into Scala, or input into Agrovata [not recommended]
      POSTREF scaled file for input to POSTREF
      UNMERGED scaled, partials summed (or scaled), but not merged
    3. by default, the SDCORRECTION parameter SdFac (multiplier) will be automatically adjusted, from the normal probability analysis of deviations. This is done in a separate pass through the data before the final merging pass. The command SDCORRECTION NOADJUST disables this adjustment.
    4. The scaling option TAILS has been introduced. This makes some attempt to correct for the different truncation of the tails of diffuse scattering between fulls & partials. This option comes with a health warning: it should be treated with caution. Try with & without. (see commands SCALES . . TAILS, FIX, [UN]LINK)
    5. the way of putting data (e.g. native) back into the scaling as a reference set has changed. See example.
    6. treatment of summed partials has been elaborated (see FINAL & INTENSITIES keywords above). In 2.2.1, the defaults are not set optimally (whatever that means!): this is improved in 2.2.2
    7. Recommended usage:

      FINAL PARTIALS CHECK TEST 0.95 1.05     # for Mosflm

      FINAL PARTIALS TEST 0.95 1.05 # for Denzo (but FractionCalc
      # is rather unreliable)

    8. Scales are dumped to the file SCALES by default (see DUMP & RESTORE)
    9. Normal probability analyses done, plots output to files NORMPLOT and ANOMPLOT in a format suitable for xmgr (from your favourite ftp server)
    10. by default scaling now excludes weak data (EXCLUDE SDMIN 3.0)

    AUTHOR

    Phil Evans, MRC Laboratory of Molecular Biology, Cambridge (pre@mrc-lmb.cam.ac.uk) See above for Release Notes.

    SEE ALSO

    truncate, postref, Data Harvesting