GESAMT (CCP4: Supported Program)

NAME

gesamt - General Efficient Structural Alignment of Macromolecular Targets

SYNOPSIS

1. Printout of usage instructions:

gesamt --help

2. Alignment and superposition of two structures:

gesamt foo_1st.pdb [{-s|-d} SEL1] foo_2nd.pdb [{-s|-d} SEL2] [-high|-normal] [-o foo_out.pdb]

3. Multiple alignment and superposition (of more than two structures):

gesamt foo_1st.pdb [{-s|-d} SEL1] foo_2nd.pdb [{-s|-d} SEL2] ... foo_nth.pdb [{-s|-d} SELn] [-high|-normal] [-o foo_out.pdb]

4. Screening a pdb archive:

gesamt foo.pdb [{-s|-d} SEL] -pdb pdb-dir [-high|-normal] [hits.txt]

where SEL1/2 are optional selection strings and foo_out.pdb is an optional output file.

Keys "-s" and "-d" are used for the selection of a substructure. By default, all structure given in the corresponding file, is used. If there are more than one chain in the file, all chains are considered as a single structure. Selection format depends on the key used. Key "-s" correspondds to MMDB selection format, identical to what is used by Superpose. The format is described in pdbcur documentation. CCP4i interface works only with this type of selections.

Selection key "-d" provides a more flexible selection scheme used by SCOP:

"*", "(all)" - take all file

"-" - take chain without chain ID

"a:Ni-Mj,b:Kp-Lq,..." - take chain a residue number N insertion code i to residue number M insertion code j plus chain b residue number K insertion code p to residue number L insertion code q and so on

"a:,b:..." - take whole chains a and b and so on

"a:,b:Kp-Lq,..." - any combination of the above.

In difference of Superpose, Gesamt allows for arbitrary selection of residues, and disregards the secondary structure pattern of structures. Gesampt may be applied to non-contiguous sets of residues, partially complete and short chains.

DESCRIPTION

Gesamt aligns two or more structures using the algorithm of efficient clustering of short fragments, where the fragments are made from adjacent protein backbone C-alpha atoms, followed by an iterative three-dimensional refinement based on a dynamic programming procedure.

Gesamt uses the pairwise alignment algorithm when comparing two structures or looking for structural hits in PDB archive. When more than 2 structures are given on input, Gesamt uses the multiple alignment algorithm. Note that multiple alignment does not reduce to a set of pairwise alignments. Multiple alignmet is usefu for the identification of common structural motifs in whole protein families.

INPUT AND OUTPUT FILES

foo_1st.pdb

First input coordinate file. Although typically a PDB file, it can also be in mmCIF or MMDB binary formats. The input format is detected automatically. This is considered the Query structure to which the transformation matrix will be applied.

foo_2nd.pdb

Second input coordinate file. Although typically a PDB file, it can also be in mmCIF or MMDB binary formats. The input format is detected automatically. This is considered the Target structure.

foo_nth.pdb

Nth input coordinate file (in case of multiple alignment, n>2). Although typically a PDB file, it can also be in mmCIF or MMDB binary formats. The input format is detected automatically.

foo_out.pdb

If specified, the result of applying the transformation matrix to foo_1st.pdb is written to foo_out.pdb (pairwise alignment). In case of multiple alignment, contains all input structures superposed.

pdb-dir

A directory with pdb files to align

foo.pdb

with. Gesamt will screen any selection of pdb files in given directory. Only files with extensions ".pdb", ".ent", ".pdb.gz' and ".ent.gz" are looked at.

hits.txt

Optional output file with the list of calculated alignments ordered by the decreasing Q-score.

Command line options

The optional selection strings [{-s|-d} SEL1/2] are in the format described above.

Keys "-high" and "-normal" specify "High" and "Normal" mode, respectively. In "Normal" mode (default), Gesamt balances the quality of alignment and computation speed. This is a recommended mode for most applications. In "High" mode, Gesamt attempts to reach maximal quality with no reference to speed considerations. In "High" mode, Gesamt is about 10 times slower and achieves quality improvement in few percents of all instances on comparison with "Normal" mode.

PROGRAM OUTPUT

In case of pairwise alignment, Gesamt reports the Transformation Matrix calculated for superimposing foo_1st.pdb onto foo_2nd.pdb, Q-score and the RMSD from the superposition, as well as polar and Euler rotation angles and orthogonal translation vector.

The program then gives a residue-by-residue listing of the alignment. Strands and helices in the two structures are identified and given in the output. The output also lists distances between all matched residues at best structure superposition.

In case of multiple alignment, Gesamt outputs the same data as for pairwise alignment, calculated for both consensus structure and all cross-structure alignments.

AUTHOR

Eugene Krissinel, CCP4, Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot, OX11 0FA, UK.

REFERENCE

E.Krissinel (2012), Enhanced Fold Recognition using Efficient Short Fragment Clustering, J. Mol. Biochem., 1(2) 76-85.

"*", "(all)"	-	take all file
"-"	-	take chain without chain ID
"a:Ni-Mj,b:Kp-Lq,..."	-	take chain a residue number N insertion code i to residue number M insertion code j plus chain b residue number K insertion code p to residue number L insertion code q and so on
"a:,b:..."	-	take whole chains a and b and so on
"a:,b:Kp-Lq,..."	-	any combination of the above.