![]() |
| Home | Site Map | Facilities | User Guide | Schedule | Forms | Research | News | Staff | Links | ||
MAD ScriptsTable of Contents
OverviewThe MAD scripts use an input file containing experimental intensities and run a series of programs to scale the data, phase the structure, and autobuild a partial model in the resulting electron density map, making it easier to evaluate the quality of the MAD data on-line during the allocated beamtime. solve_structure.com reads an mtz file (output from Mosflm or Labelit). The script can be run in "fast", or "slow" (default) mode, to select the type of scaling between wavelengths and the thoroughness of density modification and model building:
The heavy atom substructure and phasing are calculated with Solve or Shelxd, depending on the user's choice. If the heavy atom substructure is already known, from a previous experiment of from a previous run, it can be reused by the script. This decreases the run time for phasing considerably. solve_structure_denzo.com takes an input file containing scalepack-scaled unmerged reflections and uses either Shelxd or Solve to solve the substructure and Solve/Resolve to phase, do density modification and build the model. Downloading the scriptsFrom SSRL computersThe scripts are located under /data/your_id/templates/MAD-scripts/. Copy the script of your choice to the directory where you are processing your data and follow the instructions below. Web downloadingYou can download the scripts from http://smb.slac.stanford.edu/templates/MAD_scripts/ Input requirementssolve_structure.com runs only on mtz files produced by mosflm. The latest version of the script can process two data sets per wavelength in slow mode(e.g., a direct and an inverse beam pass). The script cannot cope with more than one rotation gap between consecutive frames, unless you are running in "fast" mode. However, if you have experience with Scala and c-shell scripts you can edit the script to add up an extra runs. solve_structure_denzo.com uses the unmerged output from scalepack (run with "no merge original index" keyword in order to apply local scaling) InstructionsCopy the script to one of your directories and open it with your favorite editor. You have to edit the following values: Global editsMosflm inputdir (Optional): full path to input mtz
directory. Defaults to current directory. name: A global name to identify the diverse output
files. nresidue: Number of residues in A.U. Used to put your
data in an approximate absolute scale and estimate the solvent
content. This number does not matter for experimental phases, but
it will affect density modification. If you do not know how many
molecules you have in the asymmetric unit, get a guess from the
ccp4 program matthews_coef
ano_atom: Chemical symbol for your anomalous
scatterer. space_group: (Optional). If you leave it blank, it
will pick it from the mtz file header. Note that the script change
the extent of the asymmetric unit (if you have to change the point
group symmetry) but will not reindex. If you need to change the
order of the cell axes you need to edit the reindex input by hand
enantiomorph: (Optional). If the space group used for phasing has an enantiomorph, input it here. Note: this is only used when Shelx is used for substructure solution and Shelxe identified the inverse hand as the correct one. res_limit: (Optional). Maximum resolution limit. If you
want to get rid of the data in the detector corners, or your high
resolution data is not really there, or if you want to accelerate
the structure solution by leaving out very high resolution data,
set this, otherwise leave blank. The higher resolution data will
still be used in the scaling. seq (Optional): Name of file containing the
protein sequence for model building with resolve.
MIVLTVHYSSEGILV [put the sequence of chain type 1 here]
>>> [this defines the end of chain 1]
MKLVERWISSTV [put the sequence of chain type 2 here. Input just
[one copy of each unique chain]
sites (Optional) Name of a file containing the
fractional coordinates of known atom sites. If you have run the
script before (for instance, using only one wavelength), the
script will display and attempt to use the previously found sites.
xyz 0.5426 0.2964 0.3684 xyz 0.5821 0.3347 0.3460 xyz 0.4542 0.4619 0.3339 xyz 0.5187 0.1287 0.4629 xyz 0.4057 0.3359 0.4399 xyz 0.4921 0.1786 0.4310 xyz 0.4439 0.3405 0.1498 shelx If "Y", the script will use Shelxc and Shelxd to calculate the anomalous scatterer structure. Otherwise, it will use solve. (default "Yes") overwrite If "N", the script will skip steps which have already been done on a previous run of the script (it looks for output files and if they exist, they will not be overwritten. This can be useful sometimes to correct mistakes (for instance, if you want to modify the input sites for solve but you do not want to rerun the scaling). You shouldn't use overwrite = N if you want to add more data (for example, an additional wavelength) to the data or if a program failed to write a complete output file. (default "Yes") Scalepack inputspace_group: Compulsory keyword. It must be given in lower
case. resol: Resolution limits of the data. solvent_content: Fraction of the unit cell occupied by
solvent. The keywords "dir", "name", "nresidue", "ano_atom", "seq", "sites", "shelx" and "overwrite" are used as described above for the mosflm input. Wavelength editsThe wavelength input is identical for both scripts unless stated otherwise: l1, l2, etc: mtz file from mosflm or hkl files from
scalepack. Only l1 is compulsory (the scripts will attempt SAD
data solution in that case). If the parameter is left blank, the
rest of the parameters for that wavelength will be
ignored. There is a limit of three wavelengths. li1, li2, etc: A second non-consecutive batch of data can be entered for each wavelength. This can be useful when merging data from two crystals, collecting the data in non-consecutive wedges (for example, when using inverse beam mode). This option is not available in solve_structure_denzo.com, since this script does not carry out any data scaling. title1, title2, etc: Optional, but useful.
lambda1, lambda2, etc: Wavelengths: Used by solve to obtain fo and estimate f' and f" if not given f" :Optional, but strongly recommended for solve. Compulsory for Shelx (Shelxc does not use the values, but they are used internally by the script to determine which wavelength is which.) For the near edge wavelengths you can get the values from the MAD scan in BLU-ICE. For the remote wavelength, use the values from standard tables f' : As f" Advanced editsUsually you do not have to edit these parameters unless the script fails to solve the structure or you want to explore the effects of different options. cycle_limit: Relevant when shelx = y. Maximum number of Shelxd cycles to find a substructure solution. The script will initially run as many Shelxd cycles as anomalous scatterers and will test the solution with Shelxe. If the solution is not judged good, the script will double the number of Shelxd cycles and try again. This procedure will be iterated until a good solution is found or until the number of cycles exceeds cycle_limit. default_shelxres: Relevant when shelx = y. After running Shelxc the script will try to determine a sensible resolution cutoff for substructure solution based on the anomalous signal statistics. If the criteria for a good resolution cutoff are not fulfilled, Shelxd will use the default_shelxres resolution. shelx_anomcorr: Relevant when shelx = y. Anomalous signal correlation (in %, from Shelxc) used for the resolution cutoff. When the correlation falls below this value, the data will not be used in Shelxd. shelxe_dmcycle: Relevant when shelx = y. Number of cycles of density modification Shelxe will apply after phasing with the best Shelxd solution. A low number of cycles will give faster but less accurate results. cutoff: Relevant when shelx = y. After running Shelxe using both solution hands, the script checks that the difference in the final correlation coefficient is larger than cutoff. If it is not, the script will run more Shelxd cycles. Running the scriptsAfter editing the parameters, save the file and type the file name followed by "fast" or "slow" to run the script in those speed modes. The default is "slow" speed. Example: solve_structure.com fast
OutputThe script writes information as it proceeds. A successful run looks like this: 14:44:54 Using slow mode for scaling and structure solution. Will find sites from scratch 14:44:55 rebatch done - output: Prepare/infl_1_001.mtz - log: Prepare/rebatch_2wav.log 14:44:55 rebatch done - output: Prepare/remo_2_001.mtz - log: Prepare/rebatch_2wav.log ---Rebatch changes the batch numbers on the input files, so that each wavelength has different batch numbers 14:44:57 sort done - output: Prepare/sort_2wav.mtz - log: Prepare/sort_2wav.log ---Sortmtz merges all wavelengths and Sorts data on h,k,l and m/isym 14:45:12 scaling done - output: Scala/scale_2wav.mtz - log: Scala/scale_2wav.log 14:45:15 Merging for l1 done - output: Scala/merge_l1_2wav.mtz - log: Scala/merge_l1_2wav.log R-merge is 0.030 (0.037 for the last resolution bin.) Completeness is 99.5% (100.0% for the last resolution bin.) ---Scala merges the data for first wavelength. Examine the log file! It gives you interesting statistics about your data set 14:45:18 Transformed unmerged scaled data into scalepack format: unmerged_l1_2wav.sca ---The data is transformed in scalepack format for input to Shelx. 14:45:19 Truncate l1 done - output: Truncate/truncate_l1_2wav.mtz - log: Truncate/truncate_l1_2wav.log ---Converts Is to Fs. Log file is worth examining. The distribution of intensity momenta can be an indication of twinning. 14:45:21 Merging for l2 done - output: Scala/merge_l2_2wav.mtz - log: Scala/merge_l2_2wav.log R-merge is 0.026 (0.031 for the last resolution bin.) Completeness is 99.4% (100.0% for the last resolution bin.) 14:45:25 Transformed unmerged scaled data into scalepack format: unmerged_l2_2wav.sca 14:45:25 Truncate l2 done - output: Truncate/truncate_l2_2wav.mtz - log: Truncate/truncate_l2_2wav.log Shelx will use data to 3.00 A. ---Shelx uses a resolution cutoff based on the anomalous signal correlation between the two wavelengths. Running shelxd for 5 tries. 14:45:34 Shelxd done. Log: Shelx/shelxd_2wav.log. Best solution CC=52.52. Do 8 cycles of density modification on this solution. ---The correlation coefficient is not necessarily an indication that the structure is correct. Check the output log and verify that there is a significant difference between the good best solutions and the poorer ones. 14:45:37 Shelxe done. Log: Shelx/shelxe_2wav.log. Pseudo-CC=23 Trying density modification with inverse hand. 14:45:40 Shelxe done. Log: Shelx/shelxe_2wav_i.log. Pseudo-CC=47 ---MAD maps tend to show a large difference in correlation between the correct and the inverse hands. The contrast is usually less for SAD. Shelxe writes out maps (2wav.ph and 2wav_i.ph in this example) which can be viewed by COOT. Solve will use the inverse hand solution from shelxd for phasing. 14:45:40 Output files merged - output: Scaleit/cad_2wav.mtz - log: Scaleit/cad_2wav.log 14:45:41 scaleit done - output: Scaleit/scaleit_2wav.mtz - log: Scaleit/scaleit_2wav.log ---CAD and Scaleit will prepare the data for input to Solve. Starting solve. Check progress in Solve/solve.status. ---Solve is now only used for phasing. If not using Shelx to find the sites, Solve will take considerably longer to run. 14:46:15 Solve done - log: Solve/solve_2wav.prt. Maps: Solve/solve_2wav.map and Solve/solve_2wav.ezd DMIN: TOTAL 10.81 6.82 5.33 4.52 3.99 3.61 3.33 3.10 MEAN FIG MERIT: 0.68 0.71 0.77 0.73 0.69 0.69 0.66 0.63 0.64 Starting resolve with solvent content .390. Only do density modification at first. ---The solvent content is calculated based in the contents of the asymmetric unit. 14:47:26 Resolve done - output Resolve/resolve_2wav.log. Reflection file: Resolve/resolve_2wav.mtz Final correlation coefficient is 0.5258557 Final R-factor is 0.4142982 Calculating a ccp4 map around the heavy atom positions. 14:47:27 fft and mapmask done - Map: Resolve/resolve_2wav.map. Log : Resolve/fft_2wav.log ---A map is calculated before doing model building (which can take long). Trying to build model with sequence in cat ./tutorial/seq.pir ---If the sequence file is not provided, resolve will try to build the main chain only. Total residues placed: 0 of 364 or 0% Residues built without side chains: 227 Total residues built: 227 or 62% Calculating a ccp4 map around the model. 15:05:47 Resolve done - Model: Resolve/build_2wav.pdb. - Map: Resolve/build_2wav.map. - Log : Resolve/build_2wav.log. Bye ProblemsKnown bugsThe script does not run when the input data files are located under different directories. Some common and uncommon runtime errorsIf there is something wrong, the program involved will write out an error message. If this happens, verify that you are running an up-to-date version of the script (1.39 for solve_structure.com and 1.21 for solve_structure_denzo.com). If not, download the script as described above. Here is a list of other common error messages and their causes table1 - subscript out of range Check previous log. If you see messages like rebatch: Command not found sortmtz: Command not found etc. the problem is usually a malformed input file name. Note that the input names CANNOT have a directory name preceding them. Another possible source for this error is Scala; Scala usually divides the data into 10 resolution bins, but, if the data resolution is low, only 9 bins may be used. If so, the fail-safe way to get rig of this error is to edit the script, and search and delete the following lines (all three instances):
set rfactf = $table1[1]
set rfactor = $table1[2]
set complf = $table2[1]
set compl = $table2[2]
echo 'R-merge is '$rfactor' ('$rfactf' for the last resolution bin.)'
echo 'Completeness is '$compl'% ('$complf'% for the last resolution bin.)\n'
The disadvantage of doing that is that the script will not report
the R-factor and resolution any more and you will have to inspect the
log files. A more laborious but more sophisticated solution is to look for the wavelength or wavelengths that
are giving the error (for instance, if the error appears after merging
the 2nd wavelength, search for
set table1=`awk '/N 1\/resol\^2 Dmin\(A\) Rmrg Rfull/ {while (getline) if ( $1 ~/10$/ ) {print $4, $6; exit} }' ${id}_$name.log`
set table2=`awk '/N 1\/resol\^2 Dmin / {while (getline) if ( $1 ~/10$/ ) {print $7, $8; exit} }' ${id}_$name.log`
Substitute ERROR! File ../peak_3_001.mtz does not exist or you do not have read permission...Bye. Check that the file exists. You may have misspelled the name or given an incorrect directory path Scala: *** Gap in rotation *** You have a gap in rotation in you input file. The script will stop after sort. Look at reference.log to see which is the offending batch. An unstable data processing with a poorly defined unit cell or orientation can also cause this problem. Scala/merge_l3_test.mtz already exists. Skipping merging l3. If the script suddenly stops without an error message like above, it may be because you interrupted the script (with ^C) and then you rerun it with overwrite = N. Try using overwrite = Y or deleting the output file mentioned in the last line of standard output (in the example above, delete "Scala/merge_l3_test.mtz". Resolve could not be run. Probably SOLVE did not find a good solution If you are using SAD data, make sure that l1 is the wavelength with maximum f". If it is a 2 wavelength MAD, one of them should be the remote wavelength. Also check that your f'and f" values are the correct for each wavelength . If you are not sure, leave fp and fpp blank and let solve do the guessing for you. If solve does not work with three reasonably complete wavelengths and there are not obvious problems with the processing or the quality of the data , there are a few possible causes:
Error while reading shelxc_sad2.log! Shelxc could not run properly. This can happen if the input file in "scalepack" format did not get generated correctly by Scala (looking at the Shelx/shelxc_name.log file will reveal the cause). If the data are strong, the problem may be a format overflow; edit the file(s) unmerged_l#_name.sca and search for reflections containing the string "******". Delete the entire line; then edit the structure solve script and set the overwrite variable to "n" (to prevent the edited scalepack files from being overwritten by the script); and run the script again. If you cannot easily determine the cause of the error, set the script variable shelx to "n", to use Solve for substructure solution instead. The script fails to solve the structureIf the phasing statistics look good but the maps look totally uninterpretable, a possible explanation is that you are not swapping wavelengths (in particular, swapping the remote wavelength with the inflection or the peak will have that effect (the sites will have "negative" occupancies). If the input parameters are correct, try the following:
Major changesVersion 1.39
Version 1.36
Version 1.27
Version 1.10
Version 1.6
Version 1.2
|
||
|
|
||
| Technical questions: Webmaster
Content
questions: Ana Gonzalez |
||
| Last modified:Friday, 04-Sep-2009 15:43:35 PDT. | ||