PRODRG (CCP4: Supported Program)

NAME

cprodrg - generation of small molecule coordinates and topologies for use in refinement and model building from a variety of descriptions, including PDB coordinates and SMILES.

SYNOPSIS

cprodrg XYZIN foo_in.pdb LIBOUT foo_out.cif XYZOUT foo_out.pdb MOLOUT foo_out.mol
[Keyworded input]

DESCRIPTION

Crystallographic refinement and automated model-building require detailed descriptions of the chemical entities being refined, including information about atom types and connectivity (topology) and chemical parameters (bond lengths, angles, etc.). The CCP4 monomer library includes such descriptions for standard amino acids, nucleic acids and a variety of common small molecules, but cannot cover all possible chemicals you might want to include in a structure. The main purpose of PRODRG is to automatically generate such topological/parameter information for arbitrary small molecules (with some LIMITATIONS) from user input.

In addition, PRODRG can also generate reasonable three-dimensional coordinates if none are available or optionally improve the input conformation if they are. The resulting coordinates are written out in PDB and/or MDL Molfile format. Please note that the PRODRG-generated topology must be used in conjunction with the PRODRG-generated PDB file rather than with the input file, even if that file is in PDB format and no modifications had been requested. The main reason for this is that PRODRG may rename atoms in your input, making it incompatible with the generated topology.

Thirdly, PRODRG can be used to create 'flattened' two-dimensional coordinates of a molecule, useful for showing its chemical structure (see MINImise).

Finally, PRODRG allows the alteration of given compounds by adding functional groups to them or removing/replacing parts of the molecule. This is described in more detail under MODIFYING THE INPUT.

INPUT/OUTPUT FILES

XYZIN

Description of the desired small molecule in one of the valid PRODRG input formats:

PRODRG will automatically recognise the format of the input file. Note that for SMILES input to be recognised, it must not contain any line breaks. In general, non-PDB input formats should be preferred if available, as they provide more detailed information about the requested molecule to PRODRG.

LIBOUT

CIF-formatted topology for the given small molecule generated by PRODRG.

XYZOUT

Coordinates for the given molecule in PDB format generated by PRODRG. Even if you provided your input to PRODRG as a PDB file and did not request energy minimisation, you should always use this file together with the generated topology, as PRODRG may have had to rename atoms in your input, which would render its topology incompatible with the original coordinate file.

MOLOUT

Coordinates for the given molecule in MDL Molfile format generated by PRODRG.

KEYWORDED INPUT

PRODRG's operation can be controlled by keywords given on standard input. There are no compulsory keywords, the defaults for each keyword are indicated below. In all cases only the first four characters of a keyword are significant. Recognised keywords are: COORds, PROTonate, MINImise, CHIRality, END.

COORds PDB | MOL | BOTH

Write output coordinates in PDB format only (COOR PDB, default), in MDL Molfile format only (COOR MOL) or in both formats (COOR BOTH).

PROTonate ALL | POLAR | NONE

By default, PRODRG will write out coordinate files including all hydrogen atoms (PROT ALL). Alternatively, coordinate files containing only polar hydrogens (PROT POLAR) or no hydrogens at all (PROT NONE) can be written. The CIF topology will always include all hydrogen atoms.

MINImise BUILD | YES | NO | FLAT

This keyword controls PRODRG's coordinate generation feature. If disabled (MINI NO), output atom coordinates will be identical to input coordinates, or random if the input does not contain information about atom positions (SMILES etc.). This option should be chosen when the position/conformation of the input molecule must be conserved, e.g. because it has already been manually placed into a model, as any other minimisation choice may alter the conformation and/or position of the given molecule. Enabling minimisation (MINI YES) will either attempt to improve the input conformation or, if there are no input coordinates, generate a reasonable conformation from scratch. MINI BUILD (the default) is equivalent to MINI NO, unless the input contains building commands (see MODIFYING THE INPUT), in which case minimisation is enabled. MINI FLAT enables minimisation, but instead of producing 3D coordinates, the output file will contain 'flattened' 2D coordinates.

CHIRality YES | NO | INPUT

Directs the use of chirality (=improper) restraints. The default is to restrain chiral centres (CHIR YES). CHIR NO disables all chirality restraints, which may be useful when building things like fullerenes starting from a non-coordinate description. CHIR INPUT applies chirality restraints only if the input file specifies stereochemistry (3D coordinates, wedged bonds, etc.), but not otherwise.

END

This terminates keyworded input. If an explicit END keyword is not given, PRODRG will stop reading keywords at the end of input.

LIMITATIONS

PRODRG is designed to be used on small molecules and as such comes with a default size limit of 600 atoms, including hydrogens after processing. The limit can be changed by editing the MA constant in $CSRC/prodrg/params.inc and recompiling, but be aware that memory usage increases quadratically with the number of atoms, which means that e.g. creating a topology for your entire protein will be impossible on current hardware (aside from being a really bad idea).

Furthermore, PRODRG's support for atom types is limited by the underlying force field (a modified version of GROMOS96 43a1), which means, amongst other things, that there is no support for metal ions/atoms, either by themselves or as part of organic compounds.

TEXT DRAWINGS

PRODRG can accept input molecules as text-based 'drawings' of chemical structures, using chemical symbols to place atoms and separators to indicate bonds (- and | for single bonds, = and " for double bonds and # for triple bonds). A few simple examples of PRODRG text drawings are:

Formate

O-C=O

Acetonitrile

C-C#N

Benzene

C-C=C
"   |
C-C=C

Adenine

  N
  |
N=C-C---N
|   "   "
C=N-C-N-C

Lowercase atom names can be used to change the chirality of that atom, for non-chiral centres there is no difference between uppercase and lowercase symbols.

All atoms must be separated by bonds, i.e. C-C describes a two carbons connected by a single bond, while CC is invalid. Bonds can be of arbitrary length (C--C is the same as C-C) and for single and double bonds, choice of either of the two valid bond symbols is purely cosmetic (C|C or even C||-C look strange, but are identical to C-C), as long as different bond types are not mixed (e.g. C-=-C is nonsensical and invalid). Because of the interchangeability of horizontal and vertical bond symbols, all bonds must be separated by at least one space (i.e. || is the same as -- and thus part of one bond, while | | shows parts of two separate bonds). Bonds can connect to atoms from above, below, left or right – diagonal connections are not allowed.

Based on this, another (needlessly complicated but valid) way to depict adenine could be:

N====C-C-N
|    | " "
|--C N " C
   "   " |
   N---C-N

MODIFYING THE INPUT

PRODRG supports a number of commands that allow to modify the input molecule. These commands must be added to the input file. It should be noted that most of these commands refer to atoms by name, which can make their use in conjunction with input formats not using atom names (SMILES, text drawings, ...) awkward. In these cases PRODRG should be run without the modifying command(s) first to see what names the program assigns to atoms of interest, then PRODRG can be run again on the same input as before with the commands in place. The same procedure applies in cases where PRODRG changes atom names while processing a molecule.

1. Changing hybridisation

The PATCH command allows to change the hybridisation of an atom. While this is mostly meant to be a means for the user to help PRODRG interpret low-quality input, it can also be used to introduce double bonds etc., as long as care is taken that the result of the patches applied makes chemical sense.

PATCH <atomname> 1
PATCH <atomname> 2
PATCH <atomname> 3

can be used to force the given atom to sp, sp2 or sp3 hybridisation, respectively. In the case of sp hybridisation a further distinction should be made between sp-hybridised atoms as part of triple bonds or sp-hybridised atoms in allene-like systems. For improved results, PATCH <atomname> 10 should be used for the latter, and PATCH <atomname> 1 should be used for the former only.

2. Inverting chiral centres

The chirality of any atom can be inverted with

PATCH <atomname> -1

3. Controlling output hydrogens

To generate output in specific 'non-standard' protonation states, the two commands

INSHYD <atomname>
DELHYD <atomname>

can be used to add/remove hydrogens. Note that for both commands the specified atom is the one the hydrogen is attached to, i.e. not the hydrogen itself in the case of DELHYD.

4. Substituting individual atoms

The chemical identity of an input atom can be altered with

BUILD <atomname> @<type>

where <type> is the chemical symbol of the target type. As an example, a standard serine residue could be 'mutated' to a cysteine using

BUILD OG @S

5. Removing parts of the input molecule

Bonds in the input molecule can be cut with the command

CHOP <atomname1> <atomname2>

where the two given atoms are connected by the bond to be removed. If the cutting produces two separate molecules, the smaller part is deleted. You can use the additional command

KEPSML

to delete the larger part instead. Again using a standard serine residue as an example

CHOP CB OG

could be used to turn it into an alanine.

6. Adding functional groups

PRODRG can also be used to add new atoms to existing molecules using the command

BUILD <atomname> <fragmentname>

which attaches a new 'fragment' to the specified atom. Examples of fragments are ME for a methyl group, OH for a hydroxyl group or PHI to add a phenyl. A complete list of fragments can be found here. If the addition of a new group creates a chiral centre at the attachment point, its chirality can be changed by prefixing the atom name with a tilde character (~).

Some default fragments have two attachment points, e.g. the 2-EPOXY fragment used to introduce an epoxy bridge between two atoms or the CONECT fragment that does not actually add anything but simply connects two existing atoms. For these fragments, the second attachment point must be specified after the fragment name, i.e.

BUILD <atomname1> <fragmentname> <atomname2>

7. Renaming the molecule

You can change the residue name with the command

CPNAME <newname>

This can be used to avoid the default 'DRG' when creating compounds from scratch or to update the name to something more appropriate during building.

REFERENCE

A. W. Schuettelkopf and D. M. F. van Aalten (2004). PRODRG – a tool for high-throughput crystallography of protein-ligand complexes. Acta Crystallogr D60, 1355–1363.

AUTHOR

Alexander W. Schuettelkopf and Daan M. F. van Aalten, Division of Molecular Microbiology, College of Life Sciences, University of Dundee

EXAMPLES

Generating a topology for an existing ligand

cprodrg XYZIN ligand.pdb XYZOUT ligand_use.pdb LIBOUT ligand_use.cif <<EOF
END
EOF

Toplogy and coordinates from scratch for a simple molecule (propanol in this case)

echo C-C-C-O > temp.draw
cprodrg XYZIN temp.draw XYZOUT ligand_use.pdb LIBOUT ligand_use.cif <<EOF
MINI YES
EOF
rm temp.draw

Creating aminolysine from lysine (provided in lys.pdb)

echo BUILD NZ ME >> lys.pdb
echo CPNAME MLY >> lys.pdb
cprodrg XYZIN lys.pdb XYZOUT mly.pdb LIBOUT mly.cif <<EOF
END
EOF

Generating 2D coordinates in MDL Molfile format (for plotting) from a given PDB file

cprodrg XYZIN ligand.pdb XYZOUT plotme.mol LIBOUT /dev/null <<EOF
COOR MOL
PROT NONE
MINI FLAT
END
EOF