MAD ScriptsTable of Contents
OverviewThe MAD scripts use an input file containing experimental intensities and run a series of programs to scale the data, phase the structure, and autobuild a partial model in the resulting electron density map, making it easier to evaluate the quality of the MAD data on-line during the allocated beamtime. solve_structure.com reads an mtz file containing unmerged and unscaled integrated intensities, as written out by Mosflm, or if using the Autoxds script, the rootName_xds.mtz file written by pointless after transforming the ascii reflection file from XDS. The heavy atom substructure and phasing are calculated with Solve or Shelxd, depending on the user's choice. If the heavy atom substructure is already known, from a previous experiment of from a previous run, it can be reused by the script. This decreases the run time for phasing considerably. solve_structure_denzo.com takes an input file containing scalepack-scaled unmerged reflections and uses either Shelxd or Solve to solve the substructure and Solve/Resolve to phase, do density modification and build the model. Downloading the scriptsFrom SSRL computersThe scripts are located under /data/your_id/templates/MAD-scripts/. Copy the script of your choice to the directory where you are processing your data and follow the instructions below. Web downloadingYou can download the scripts from http://smb.slac.stanford.edu/templates/MAD_scripts/ Input requirementssolve_structure.com runs only on mtz files produced by mosflm or pointless. The current version of the programs used can deal automatically with rotation gaps between consecutive frames, so it is not necessary to input an inverse beam pass as a separate run. solve_structure_denzo.com uses the unmerged output from scalepack (run with "no merge original index" keyword in order to apply local scaling) InstructionsCopy the script to one of your directories and open it with your favorite editor. You have to edit the following values: Global editsMTZ inputdir (Optional): full path to input mtz
directory. Defaults to current directory. name: A global name to identify the diverse output
files. nresidue: Number of residues in A.U. Used to put your
data in an approximate absolute scale and estimate the solvent
content. This number does not matter for experimental phases, but
it will affect density modification. If you do not know how many
molecules you have in the asymmetric unit, get a guess from the
ccp4 program matthews_coef
ano_atom: Chemical symbol for your anomalous
scatterer. space_group: (Optional). If you leave it blank, it
will pick it from the mtz file header. Note that the script change
the extent of the asymmetric unit (if you have to change the point
group symmetry) but will not reindex. If you need to change the
order of the cell axes you need to edit the pointless command by hand
enantiomorph: (Optional). If the space group used for phasing has an enantiomorph, input it here. Note: this is only used when Shelx is used for substructure solution and Shelxe identified the inverse hand as the correct one. res_limit: (Optional). Maximum resolution limit. If you
want to get rid of the data in the detector corners, or your high
resolution data is not really there, or if you want to accelerate
the structure solution by leaving out very high resolution data,
set this, otherwise leave blank. The higher resolution data will
still be used in the scaling. seq (Optional): Name of file containing the
protein sequence for model building with resolve. MIVLTVHYSSEGILV [put the sequence of chain type 1 here] >>> [this defines the end of chain 1] MKLVERWISSTV [put the sequence of chain type 2 here. Input just [one copy of each unique chain] sites (Optional) Name of a file containing the
fractional coordinates of known atom sites. If you have run the
script before (for instance, using only one wavelength), the
script will display and attempt to use the previously found sites.
xyz 0.5426 0.2964 0.3684 xyz 0.5821 0.3347 0.3460 xyz 0.4542 0.4619 0.3339 xyz 0.5187 0.1287 0.4629 xyz 0.4057 0.3359 0.4399 xyz 0.4921 0.1786 0.4310 xyz 0.4439 0.3405 0.1498 shelx If "Y", the script will use Shelxc and Shelxd to calculate the anomalous scatterer structure. Otherwise, it will use solve. (default "Yes") overwrite If "N", the script will skip steps which have already been done on a previous run of the script (it looks for output files and if they exist, they will not be overwritten. This can be useful sometimes to correct mistakes (for instance, if you want to modify the input sites for solve but you do not want to rerun the scaling). You shouldn't use overwrite = N if you want to add more data (for example, an additional wavelength) to the data or if a program failed to write a complete output file. (default "Yes") Scalepack inputspace_group: Compulsory keyword. It must be given in lower
case. resol: Resolution limits of the data. solvent_content: Fraction of the unit cell occupied by
solvent. The keywords "dir", "name", "nresidue", "ano_atom", "seq", "sites", "shelx" and "overwrite" are used as described above for the mosflm input. Wavelength editsThe wavelength input is identical for both scripts unless stated otherwise: l1, l2, etc: mtz file from mosflm or hkl files from
scalepack. Only l1 is compulsory (the scripts will attempt SAD
data solution in that case). If the parameter is left blank, the
rest of the parameters for that wavelength will be
ignored. There is a limit of three wavelengths. title1, title2, etc: Optional, but useful.
lambda1, lambda2, etc: Wavelengths: Used by solve to obtain fo and estimate f' and f" if not given f" :Optional, but strongly recommended for solve. Compulsory for Shelx (Shelxc does not use the values, but they are used internally by the script to determine which wavelength is which.) For the near edge wavelengths you can get the values from the MAD scan in BLU-ICE. For the remote wavelength, use the values from standard tables f' : As f" Advanced editsUsually you do not have to edit these parameters unless the script fails to solve the structure or you want to explore the effects of different options. cycle_limit: Relevant when shelx = y. Maximum number of Shelxd cycles to find a substructure solution. The script will initially run as many Shelxd cycles as anomalous scatterers and will test the solution with Shelxe. If the solution is not judged good, the script will double the number of Shelxd cycles and try again. This procedure will be iterated until a good solution is found or until the number of cycles exceeds cycle_limit. default_shelxres: Relevant when shelx = y. After running Shelxc the script will try to determine a sensible resolution cutoff for substructure solution based on the anomalous signal statistics. If the criteria for a good resolution cutoff are not fulfilled, Shelxd will use the default_shelxres resolution. shelx_anomcorr: Relevant when shelx = y. Anomalous signal correlation (in %, from Shelxc) used for the resolution cutoff. When the correlation falls below this value, the data will not be used in Shelxd. shelxe_dmcycle: Relevant when shelx = y. Number of cycles of density modification Shelxe will apply after phasing with the best Shelxd solution. A low number of cycles will give faster but less accurate results. cutoff: Relevant when shelx = y. After running Shelxe using both solution hands, the script checks that the difference in the final correlation coefficient is larger than cutoff. If it is not, the script will run more Shelxd cycles. Running the scriptsAfter editing the parameters, save the file and type the file name. Example: solve_structure.com
OutputThe script writes information as it proceeds. A successful run looks like this: 17:38:41 Will find sites from scratch 17:38:50 pointless done - output: Prepare/pointless_test.mtz - log: Prepare/pointless_test.log ---Pointless is used to rebatch the image numbers from different wavelengths to avoid duplications, to check the space group and to sort and merge all the files 17:39:20 scaling done - output: Aimless/scale_test.mtz - log: Aimless/scale_test.log ---Aimless scales all the data for all the wavelengths. The log file contains information about the anomalous and dispersive signal present in the combined data. 17:39:26 Merging for l1 done - output: Aimless/merge_l1_test.mtz - log: Aimless/merge_l1_test.log R-merge is (within (0.040 for the last resolution bin.) Completeness is 99.8% (99.2% for the last resolution bin.) ---Aimless merges the data for first wavelength. Examine the log-- file for statistics about each data set 17:39:32 Transformed unmerged scaled data into scalepack format: unmerged_l1_test.sca ---The data is transformed in scalepack format for input to Shelx. 17:39:33 Truncate l1 done - output: Truncate/truncate_l1_test.mtz - log: Truncate/truncate_l1_test.log ---Converts Is to Fs. Log file is worth examining. The distribution of intensity momenta can be an indication of twinning. 17:39:39 Merging for l2 done - output: Aimless/merge_l2_test.mtz - log: Aimless/merge_l2_test.log R-merge is (within (0.036 for the last resolution bin.) Completeness is 99.7% (98.6% for the last resolution bin.) 17:39:46 Transformed unmerged scaled data into scalepack format: unmerged_l2_test.sca 17:39:47 Truncate l2 done - output: Truncate/truncate_l2_test.mtz - log: Truncate/truncate_l2_test.log 17:39:55 Merging for l3 done - output: Aimless/merge_l3_test.mtz - log: Aimless/merge_l3_test.log R-merge is (within (0.043 for the last resolution bin.) Completeness is 76.4% (6.4% for the last resolution bin. 17:40:04 Transformed unmerged scaled data into scalepack format: unmerged_l3_test.sca 17:40:04 Truncate l3 done - output: Truncate/truncate_l3_test.mtz - log: Truncate/truncate_l3_test.log Shelx will use data to 2.2 A. ---Shelx uses a resolution cutoff based on the anomalous signal correlation between the two wavelengths. 17:40:07 Shelxc done. Log: Shelx/shelxc_test.log. Running shelxd for 10 tries. 17:40:11 Shelxd done. Log: Shelx/shelxd_test.log. Best solution CFOM=118.0. Do 8 cycles of density modification on this solution. ---The correlation coefficient is not necessarily an indication that the structure is correct. Check the output log and verify that there is a significant difference between the good best solutions and the poorer ones. 17:40:24 Shelxe done. Log: Shelx/shelxe_test.log. Pseudo-CC=26 Trying density modification with inverse hand. 17:40:36 Shelxe done. Log: Shelx/shelxe_test_i.log. Pseudo-CC=72 ---MAD maps tend to show a large difference in correlation between the correct and the inverse hands. The contrast is usually less for SAD. Shelxe writes out maps which can be viewed by COOT. Solve will use the inverse hand solution from shelxd for phasing. 17:40:37 Output files merged - output: Scaleit/cad_test.mtz - log: Scaleit/cad_test.log 17:40:39 scaleit done - output: Scaleit/scaleit_test.mtz - log: Scaleit/scaleit_test.log ---CAD and Scaleit will prepare the data for input to Solve. Starting solve. Check progress in Solve/solve.status. ---Solve is now only used for phasing. If not using Shelx to find the sites, Solve will take considerably longer to run. 17:43:19 Solve done - log: Solve/solve_test.prt. Maps: Solve/solve_test.map and Solve/solve_test.ezd DMIN: TOTAL 7.22 4.53 3.54 3.00 2.65 2.39 2.20 2.05 MEAN FIG MERIT: 0.63 0.81 0.76 0.72 0.71 0.69 0.61 0.56 0.43 Starting resolve with solvent content .390. Only do density modification at first. ---The solvent content is calculated based in the contents of the asymmetric unit. 17:46:11 Resolve done - output Resolve/resolve_test.log. Reflection file: Resolve/resolve_test.mtz Final correlation coefficient is 0.7858781 Final R-factor is 0.2518314 Calculating a ccp4 map around the heavy atom positions. 17:46:13 fft and mapmask done - Map: Resolve/resolve_test.map. Log : Resolve/fft_test.log ---A map is calculated before doing model building (which can take long). Trying to build model with sequence in cat ./tutorial/seq.pir ---If the sequence file is not provided, resolve will try to build the main chain only. Total residues placed: 274 of 364 or 75% Residues built without side chains: 58 Total residues built: 332 or 91% Calculating a ccp4 map around the model. 17:53:57 Resolve done - Model: Resolve/build_test.pdb. - Map: Resolve/build_test.map. - Log : Resolve/build_test.log. Bye ProblemsKnown problemsThe script does not run when the input data files are located under different directories. Some common and uncommon runtime errorsIf there is something wrong, the program involved will write out an error message. Most catstrophical failures result from using a very old version of the script. To make sure, download the most recent version as described above. Note that the script does not do much error catching. That means that, sometimes, if a program fails, the script may keep on running - and all the programs will fail in turn. If you see an error, check the logs of the programs run befotre to determine the precise cause of failure. Here is a list of other common error messages and their causes FILEIO: cannot open file tutorial/infl_1_002.mtz Check that the file exists. You may have misspelled the name or given an incorrect directory path Scala/merge_l3_test.mtz already exists. Skipping merging l3. If the script suddenly stops without an error message like above, it may be because you interrupted the script (with ^C) and then you rerun it with overwrite = N. Try using overwrite = Y or deleting the output file mentioned in the last line of standard output (in the example above, delete "Aimless/merge_l3_test.mtz". Resolve could not be run. Probably SOLVE did not find a good solution If you are using SAD data, make sure that l1 is the wavelength with maximum f". If it is a 2 wavelength MAD, one of them should be the remote wavelength. Also check that your f'and f" values are the correct for each wavelength . If you are not sure, leave fp and fpp blank and let solve do the guessing for you. If solve does not work with three reasonably complete wavelengths and there are not obvious problems with the processing or the quality of the data , there are a few possible causes:
Error while reading shelxc_sad2.log! Shelxc could not run properly. This can happen if the input file in "scalepack" format did not get generated correctly by Aimless (looking at the Shelx/shelxc_name.log file will reveal the cause). If the data are strong, the problem may be a format overflow; edit the file(s) unmerged_l#_name.sca and search for reflections containing the string "******". Delete the entire line; then edit the structure solve script and set the overwrite variable to "n" (to prevent the edited scalepack files from being overwritten by the script); and run the script again. If you cannot easily determine the cause of the error, set the script variable shelx to "n", to use Solve for substructure solution instead. The script fails to solve the structureIf the phasing statistics look good but the maps look totally uninterpretable, a possible explanation is that you are not swapping wavelengths (in particular, swapping the remote wavelength with the inflection or the peak will have that effect (the sites will have "negative" occupancies). If the input parameters are correct, try the following:
Major changesVersion 1.48
Version 1.46
Version 1.39
Version 1.36
Version 1.27
Version 1.10
Version 1.6
Version 1.2
|
||
|
||
Technical questions: Webmaster
Content
questions: Ana Gonzalez |
||
Last modified:Thursday, 10-Mar-2022 16:41:50 PST. |