DATA HARVESTING MANAGER


Contents

Description
Selecting Files
Programs
Validating Harvest Files
Convert CIF files to XML
Extract additional information for deposition
Running PDB_EXTRACT with CCP4 Programs
Output

Description

The Data Harvesting Manager is a tool to manage and maintain any harvest files produced by CCP4 programs. It will run tasks to validate the format and consistency of produced harvest files in the same dataset, convert the harvest files from CIF to XML and is also an interface to the PDB_EXTRACT package which extracts additional information from harvest files, output log files and output MTZ files for deposition.

Selecting Files

In order to make it easier to manage and maintain harvest files in the Harvesting Manager, the user can select multiple harvest files they wish to work with and a list of all of these files will appear in a box in the "List of harvest files selected folder". Buttons underneath the list box enables the user to view a selected file, remove selected files from the list, clear the whole list or un-highlight any selected files. Multiple files can be selected by holding down the CTRL key and clicking with the mouse.

Programs

Validating Harvest Files

This program will check any highlighted files that it is written in correct mmCIF syntax. It will also output only the common information that is found in all harvest files written by CCP4 programs. If more than one file is highlighted, and the "Cross Validate Files" button is checked, the program will check for differences between the 2 files of certain data. (See cross_validate program documentation).

Convert CIF files to XML

This program will convert a selected harvest file into XML. It requires one input harvest file from the list, and an output XML file (see CIF2XML program documentation).

Extract additional information for deposition

This is an interface to the PDB_EXTRACT program suite, which will extract additional relevant information from output files of certain structure solution programs into a CIF file for use during deposition. Under programs, choose "Run Program to Extract additional information for deposition".

There are three steps where information can be extracted:

	1. Heavy atom phasing   -> Requires output from either CNS, Mlphare, Solve, Sharp, SnB, ShelxD/ShelxS
	2. Density Modification -> Requires output from either CNS, DM, Solomon, Resolve, Sharp or ShelxE
	3. Structure Refinement -> Requires output from either CNS, Refmac5, ShelxL, TNT or ARP/wARP
For each phase, the name of the program from which the output files were generated needs to be specified from the menu as well as the required files. The resulting file is written in CIF format and organised so that it is ready for deposition.

For detailed documentation, see PDB_EXTRACT.

1. Heavy Atom Phasing

Example: MAD Phasing using the CCP4 Programs MLPHARE and REVISE.

This ideally requires the harvest file from MLPHARE, and the log file from the program REVISE. This will extract phasing and wavelength information.
Select "Extract information from Heavy Atom Phasing step". A new folder will appear. Select method type and program. eg: "Using MAD and MLPHARE". Then, declare the name of the MLPHARE Harvest file as a CIF file, and the REVISE log file as the LOG file. It is not necessary to declare a PDB file for this example, since MLPHARE does not produce a final PDB file at this stage. Then choose a name for the output CIF file and run the task.

This will also run on the command line with the following command:

pdb_extract  -p MLPHARE  -iCIF [MLPHARE HARVEST FILE]  -iLOG [REVISE LOG FILE]  -o [OUTPUT CIF FILE]

2. Density Modification

Example: Using the CCP4 Program DM.

This requires only the log file from the DM program, and will create a CIF file containing some phasing statistics.

Select "Extract information from Density Modification step" and choose the DM program. Declare the DM log file as the LOG file and declare the name of the output CIF file. Run the task.

This will also run on the command line with the following command:

pdb_extract  -d DM  -iLOG [DM LOG FILE]  -o [OUTPUT CIF FILE]

3. Structure Refinement

Example: Using the CCP4 Program REFMAC5.
This ideally requires the REFMAC5 harvest file and the output PDB file. A file will be written which combines all relevant information from the harvest file and the PDB file into CIF format, including refinement and model statistics, and model coordinates.

Select "Extract information from Structure Refinement step". Then select method type and program. eg: "Using MAD and REFMAC5". Then, declare the name of the refined PDB file and the REFMAC5 harvest file. Then choose a name for the output CIF file and run the task.

This will also run on the command line with the following command:

pdb_extract  -r REFMAC5  -iCIF  [REFMAC5 HARVEST FILE]  -iPDB [REFMAC5 PDB FILE]  -o [OUTPUT CIF FILE]

Output

The output of these programs can be checked at a glance by using the window in the "Output" folder at the bottom of the task window. This will highlight whether the program has completed successfully or not, and will highlight any potential problems in the running of the programs.