MAD Scripts

Table of Contents

  1. Overview
  2. Downloading the scripts
  3. Input requirements
  4. Instructions
  5. Output
  6. Problems
  7. Changes since last version
Back to Software Facilities

Overview

The MAD scripts use an input file containing experimental intensities and run a series of programs to scale the data, phase the structure, and autobuild a partial model in the resulting electron density map, making it easier to evaluate the quality of the MAD data on-line during the allocated beamtime.

solve_structure.com reads an mtz file containing unmerged and unscaled integrated intensities, as written out by Mosflm, or if using the Autoxds script, the rootName_xds.mtz file written by pointless after transforming the ascii reflection file from XDS.

The heavy atom substructure and phasing are calculated with Solve or Shelxd, depending on the user's choice. If the heavy atom substructure is already known, from a previous experiment of from a previous run, it can be reused by the script. This decreases the run time for phasing considerably.

solve_structure_denzo.com takes an input file containing scalepack-scaled unmerged reflections and uses either Shelxd or Solve to solve the substructure and Solve/Resolve to phase, do density modification and build the model.

Downloading the scripts

From SSRL computers

The scripts are located under /data/your_id/templates/MAD-scripts/. Copy the script of your choice to the directory where you are processing your data and follow the instructions below.

Web downloading

You can download the scripts from http://smb.slac.stanford.edu/templates/MAD_scripts/

Input requirements

solve_structure.com runs only on mtz files produced by mosflm or pointless. The current version of the programs used can deal automatically with rotation gaps between consecutive frames, so it is not necessary to input an inverse beam pass as a separate run.

solve_structure_denzo.com uses the unmerged output from scalepack (run with "no merge original index" keyword in order to apply local scaling)

Instructions

Copy the script to one of your directories and open it with your favorite editor. You have to edit the following values:

Global edits

MTZ input

dir (Optional): full path to input mtz directory. Defaults to current directory.
Example: set dir = ../ . This reads the input files from a directory one level up from the current one

name: A global name to identify the diverse output files.
Example: set name = tm1

nresidue: Number of residues in A.U. Used to put your data in an approximate absolute scale and estimate the solvent content. This number does not matter for experimental phases, but it will affect density modification. If you do not know how many molecules you have in the asymmetric unit, get a guess from the ccp4 program matthews_coef
Example: set nresidue = 364

ano_atom: Chemical symbol for your anomalous scatterer.
Example: set ano_atom = se

space_group: (Optional). If you leave it blank, it will pick it from the mtz file header. Note that the script change the extent of the asymmetric unit (if you have to change the point group symmetry) but will not reindex. If you need to change the order of the cell axes you need to edit the pointless command by hand
Example:set space_group = I422 .

enantiomorph: (Optional). If the space group used for phasing has an enantiomorph, input it here. Note: this is only used when Shelx is used for substructure solution and Shelxe identified the inverse hand as the correct one.

res_limit: (Optional). Maximum resolution limit. If you want to get rid of the data in the detector corners, or your high resolution data is not really there, or if you want to accelerate the structure solution by leaving out very high resolution data, set this, otherwise leave blank. The higher resolution data will still be used in the scaling.
Example:set res_limit = 2.0

seq (Optional): Name of file containing the protein sequence for model building with resolve.
Example: set seq = ../seq.pir. The file seq.pir should look like this:

 
MIVLTVHYSSEGILV   [put the sequence of chain type 1 here]
>>>               [this defines the end of chain 1]
MKLVERWISSTV      [put the sequence of chain type 2 here. Input just
                  [one copy of each unique chain]

sites (Optional) Name of a file containing the fractional coordinates of known atom sites. If you have run the script before (for instance, using only one wavelength), the script will display and attempt to use the previously found sites.
Example: set sites = sites.ha. The file should look like this:

xyz  0.5426 0.2964 0.3684 
xyz  0.5821 0.3347 0.3460 
xyz  0.4542 0.4619 0.3339 
xyz  0.5187 0.1287 0.4629 
xyz  0.4057 0.3359 0.4399 
xyz  0.4921 0.1786 0.4310 
xyz  0.4439 0.3405 0.1498 

shelx If "Y", the script will use Shelxc and Shelxd to calculate the anomalous scatterer structure. Otherwise, it will use solve. (default "Yes")

overwrite If "N", the script will skip steps which have already been done on a previous run of the script (it looks for output files and if they exist, they will not be overwritten. This can be useful sometimes to correct mistakes (for instance, if you want to modify the input sites for solve but you do not want to rerun the scaling). You shouldn't use overwrite = N if you want to add more data (for example, an additional wavelength) to the data or if a program failed to write a complete output file. (default "Yes")

Scalepack input

space_group: Compulsory keyword. It must be given in lower case.
Example: set space_group = i422

resol: Resolution limits of the data.
Example: set resol = '53.0 2.0'

solvent_content: Fraction of the unit cell occupied by solvent.
Example: set solvent_content = 0.4

The keywords "dir", "name", "nresidue", "ano_atom", "seq", "sites", "shelx" and "overwrite" are used as described above for the mosflm input.

Wavelength edits

The wavelength input is identical for both scripts unless stated otherwise:

l1, l2, etc: mtz file from mosflm or hkl files from scalepack. Only l1 is compulsory (the scripts will attempt SAD data solution in that case). If the parameter is left blank, the rest of the parameters for that wavelength will be ignored. There is a limit of three wavelengths.
Example: set l1 = peak_1_001 #omit the .mtz extension!

title1, title2, etc: Optional, but useful.
Example: set title1 = 'remote wavelength'

lambda1, lambda2, etc: Wavelengths: Used by solve to obtain fo and estimate f' and f" if not given

f" :Optional, but strongly recommended for solve. Compulsory for Shelx (Shelxc does not use the values, but they are used internally by the script to determine which wavelength is which.) For the near edge wavelengths you can get the values from the MAD scan in BLU-ICE. For the remote wavelength, use the values from standard tables

f' : As f"

Advanced edits

Usually you do not have to edit these parameters unless the script fails to solve the structure or you want to explore the effects of different options.

cycle_limit: Relevant when shelx = y. Maximum number of Shelxd cycles to find a substructure solution. The script will initially run as many Shelxd cycles as anomalous scatterers and will test the solution with Shelxe. If the solution is not judged good, the script will double the number of Shelxd cycles and try again. This procedure will be iterated until a good solution is found or until the number of cycles exceeds cycle_limit.

default_shelxres: Relevant when shelx = y. After running Shelxc the script will try to determine a sensible resolution cutoff for substructure solution based on the anomalous signal statistics. If the criteria for a good resolution cutoff are not fulfilled, Shelxd will use the default_shelxres resolution.

shelx_anomcorr: Relevant when shelx = y. Anomalous signal correlation (in %, from Shelxc) used for the resolution cutoff. When the correlation falls below this value, the data will not be used in Shelxd.

shelxe_dmcycle: Relevant when shelx = y. Number of cycles of density modification Shelxe will apply after phasing with the best Shelxd solution. A low number of cycles will give faster but less accurate results.

cutoff: Relevant when shelx = y. After running Shelxe using both solution hands, the script checks that the difference in the final correlation coefficient is larger than cutoff. If it is not, the script will run more Shelxd cycles.

Running the scripts

After editing the parameters, save the file and type the file name. Example:

solve_structure.com

Output

The script writes information as it proceeds. A successful run looks like this:

17:38:41
Will find sites from scratch

17:38:50
pointless done  - output: Prepare/pointless_test.mtz - log: Prepare/pointless_test.log

---Pointless is used to rebatch the image numbers from different wavelengths to avoid duplications, to check the space group and to sort and merge all the files


17:39:20
 scaling done -  output: Aimless/scale_test.mtz - log: Aimless/scale_test.log

---Aimless scales all the data for all the wavelengths. The log file contains information about the anomalous and dispersive signal present in the combined data.


17:39:26
Merging for l1 done - output: Aimless/merge_l1_test.mtz - log: Aimless/merge_l1_test.log

R-merge is (within (0.040 for the last resolution bin.)
Completeness is 99.8% (99.2% for the last resolution bin.)

---Aimless merges the data for first wavelength. Examine the log-- file for statistics about each data set

17:39:32
Transformed unmerged scaled data into scalepack format: unmerged_l1_test.sca

---The data is transformed in scalepack format for input to Shelx.

17:39:33
Truncate l1 done - output: Truncate/truncate_l1_test.mtz - log: Truncate/truncate_l1_test.log

---Converts Is to Fs. Log file is worth examining. The distribution of intensity momenta can be an indication of twinning.

17:39:39
Merging for l2 done  - output: Aimless/merge_l2_test.mtz - log: Aimless/merge_l2_test.log

R-merge is (within (0.036 for the last resolution bin.)
Completeness is 99.7% (98.6% for the last resolution bin.)

17:39:46
Transformed unmerged scaled data into scalepack format: unmerged_l2_test.sca

17:39:47
Truncate l2 done - output: Truncate/truncate_l2_test.mtz - log: Truncate/truncate_l2_test.log

17:39:55
Merging for l3 done - output: Aimless/merge_l3_test.mtz - log: Aimless/merge_l3_test.log

R-merge is (within (0.043 for the last resolution bin.)
Completeness is 76.4% (6.4% for the last resolution bin.

17:40:04
Transformed unmerged scaled data into scalepack format: unmerged_l3_test.sca

17:40:04
Truncate l3 done - output: Truncate/truncate_l3_test.mtz - log: Truncate/truncate_l3_test.log

Shelx will use data to 2.2  A.

---Shelx uses a resolution cutoff based on the anomalous signal correlation between the two wavelengths.

17:40:07
Shelxc done. Log: Shelx/shelxc_test.log.
Running shelxd for 10 tries.

17:40:11
Shelxd done. Log: Shelx/shelxd_test.log. Best solution CFOM=118.0.
Do 8 cycles of density modification on this solution.

---The correlation coefficient is not necessarily an indication that the structure is correct. Check the output log and verify that there is a significant difference between the good best solutions and the poorer ones.

17:40:24
Shelxe done. Log: Shelx/shelxe_test.log. Pseudo-CC=26
Trying density modification with inverse hand. 
 
17:40:36
Shelxe done. Log: Shelx/shelxe_test_i.log. Pseudo-CC=72 

---MAD maps tend to show a large difference in correlation between the correct and the inverse hands. The contrast is usually less for SAD. Shelxe writes out maps which can be viewed by COOT.

 
Solve will use the inverse hand solution from shelxd for phasing. 

17:40:37
Output files merged - output: Scaleit/cad_test.mtz - log: Scaleit/cad_test.log

17:40:39
scaleit done - output: Scaleit/scaleit_test.mtz - log: Scaleit/scaleit_test.log

---CAD and Scaleit will prepare the data for input to Solve.

Starting solve. Check progress in Solve/solve.status.

---Solve is now only used for phasing. If not using Shelx to find the sites, Solve will take considerably longer to run.

17:43:19
Solve done - log: Solve/solve_test.prt. Maps: Solve/solve_test.map and Solve/solve_test.ezd

 DMIN:           TOTAL    7.22   4.53   3.54   3.00   2.65   2.39   2.20   2.05
 MEAN FIG MERIT:   0.63   0.81   0.76   0.72   0.71   0.69   0.61   0.56   0.43


Starting resolve with solvent content .390.
Only do density modification at first.

---The solvent content is calculated based in the contents of the asymmetric unit.

17:46:11
Resolve done - output Resolve/resolve_test.log. Reflection file: Resolve/resolve_test.mtz

Final correlation coefficient is 
0.7858781
Final R-factor is
0.2518314

Calculating a ccp4 map around the heavy atom positions.

17:46:13
fft and mapmask done - Map: Resolve/resolve_test.map. Log : Resolve/fft_test.log

---A map is calculated before doing model building (which can take long).

Trying to build model with sequence in cat ./tutorial/seq.pir

---If the sequence file is not provided, resolve will try to build the main chain only.

Total residues placed: 274 of 364 or 75%
Residues built without side chains: 58
Total residues built: 332 or 91%

Calculating a ccp4 map around the model.

17:53:57
Resolve done - Model: Resolve/build_test.pdb. 
 - Map: Resolve/build_test.map. 
 - Log : Resolve/build_test.log.

Bye

Problems

Known problems

The script does not run when the input data files are located under different directories.

Some common and uncommon runtime errors

If there is something wrong, the program involved will write out an error message. Most catstrophical failures result from using a very old version of the script. To make sure, download the most recent version as described above.

Note that the script does not do much error catching. That means that, sometimes, if a program fails, the script may keep on running - and all the programs will fail in turn. If you see an error, check the logs of the programs run befotre to determine the precise cause of failure.

Here is a list of other common error messages and their causes

FILEIO: cannot open file tutorial/infl_1_002.mtz

Check that the file exists. You may have misspelled the name or given an incorrect directory path


Scala/merge_l3_test.mtz already exists. Skipping merging l3.

If the script suddenly stops without an error message like above, it may be because you interrupted the script (with ^C) and then you rerun it with overwrite = N. Try using overwrite = Y or deleting the output file mentioned in the last line of standard output (in the example above, delete "Aimless/merge_l3_test.mtz".


Resolve could not be run. Probably SOLVE did not find a good solution

If you are using SAD data, make sure that l1 is the wavelength with maximum f". If it is a 2 wavelength MAD, one of them should be the remote wavelength. Also check that your f'and f" values are the correct for each wavelength . If you are not sure, leave fp and fpp blank and let solve do the guessing for you.

If solve does not work with three reasonably complete wavelengths and there are not obvious problems with the processing or the quality of the data , there are a few possible causes:

  • There is no anomalous scatterer bound to the protein
  • The crystal is twined
  • There is some pseudosymmetry or translational NCS (check Patterson maps)

Error while reading shelxc_sad2.log!

Shelxc could not run properly. This can happen if the input file in "scalepack" format did not get generated correctly by Aimless (looking at the Shelx/shelxc_name.log file will reveal the cause).

If the data are strong, the problem may be a format overflow; edit the file(s) unmerged_l#_name.sca and search for reflections containing the string "******". Delete the entire line; then edit the structure solve script and set the overwrite variable to "n" (to prevent the edited scalepack files from being overwritten by the script); and run the script again.

If you cannot easily determine the cause of the error, set the script variable shelx to "n", to use Solve for substructure solution instead.

The script fails to solve the structure

If the phasing statistics look good but the maps look totally uninterpretable, a possible explanation is that you are not swapping wavelengths (in particular, swapping the remote wavelength with the inflection or the peak will have that effect (the sites will have "negative" occupancies).

If the input parameters are correct, try the following:

  • Decrease the input maximum resolution: This may improve the scaling for the low resolution data Shelxd uses for substructure calculation. Once you get the correct substructure you can run the script again to full resolution using the previously determined correct sites.
  • Decrease the number of heavy atoms you expect. Some sites may be disordered and the software may be locating wrong sites instead. If this is the case, it is better to instruct the software to find fewer sites
  • Change the correlation cutoff to use data in Shelx
  • Inspect carefully the output files to check for bad images, possible twinning, wrong crystal symmetry, etc.

Major changes

Version 1.48

  • Aimless is used instead of Scala
  • Shelx updated to 2014 version

Version 1.46

  • Pointless is used instead of rebatch, reindex and sort
  • Eliminated the different speed modes, as they do not make a big difference in terms of speed vs.results

Version 1.39

  • The f' value is used to assign the wavelengths in MAD experiments. This ensures that peaks in difference Pattersons always have the right sign.
  • The auto option is used to scale data in Scaleit
  • Compatibility with ccp4 6.1.2

Version 1.36

  • The program exits if the wavelengths cannot be identified (ie, as the peak, inflection or remote) for a MAD experiment.
  • When the input space group has an enantiomorph, the user has the option to declare it. If Shelxe identifies the inverted hand as the correct one, the enantiomorph will then be used for subsequent phasing. If solve is used to locate the heavy atoms, the option is ignored.
  • Correct change of hand for space groups I41, I4122 and F4132
  • Solve/Resolve updated to version 2.13

Version 1.27

  • The program uses ccp4-6.0.2 and Solve/Resolve 2.12
  • Added support for an inverse beam pass.
  • The anomalous correlation coefficient from Shelxc is used to determine the cutoff resolution for SAD data.

Version 1.10

  • "medium" and "slow" run modes have been merged
  • The scaling protocol in slow/medium mode has been changed to the one recommended to the Scala documentation
  • A bug preventing proper parsing of the space group name in the new style mtz header has been fixed.
  • A bug in the denzo script which made the script fail to find the symmetry information has been fixed.

Version 1.6

  • The scripts now use the ccp4 5.0 programs libraries and Solve/Resolve 2.08
  • The settings for model building in medium speed have been changed

Version 1.2

  • Refined the way the resolution cutoff is chosen for Shelxd. It used to be based on the I/sigma level after scaling; this did not always work optimally
  • Test the solutions with Shelxe. This also often provides a very quick answer whether the structure is solvable or not.
  • Use the "secondary" scaling option in Scala - for smooth scaling.