SEQUINS (CCP4: Supported Program)

NAME

sequins - Statistical protein chain tracing

SYNOPSIS

csequins -mtzin-ref filename -pdbin-ref filename -mtzin-wrk filename -pdbin-wrk filename -colin-ref-fo colpath -colin-ref-hl colpath -colin-wrk-fo colpath -colin-wrk-hl colpath -resolution resolution -side-chain-omit -correlation-mode -verbose verbosity -stdin
[Keyworded input]

DESCRIPTION

'sequins' performs sequence validation by comparing the model side chains against the electron density. It may be run with phases from experimental phasing, or it can calculate its own phases using a side-chain-omit process. In this case it can be used after molecular replacement, or to validate structures in the PDB.

HOW TO RUN SEQUINS

A set of reference structure will have been provided with the program. The structure 1TQW is good for typical protein problems at resolutions up to 1.25A, although in practice including data much beyond 2.0A doesn't make much difference. For exotic cases you might want to provide your own reference structures.

INPUT/OUTPUT FILES

-pdbin-ref
Input PDB file containing the final model for the reference structure.
-mtzin-ref
Input 'reference' MTZ file. This contains the data for a known, reference structure. The required columns are F, sigF, and a set of Hendrickson-Lattman (HL) coefficients describing the calculated phases from the final model. Suitable reference structures can be constructed from the PDB using the 'Make Pirate reference' task.
-mtzin-wrk
Input 'work' MTZ file. This contains the data for the unknown, work structure. The required columns are F, sigF, and optionally a set of HL coefficients from a phasing program or phase improvement.
-pdbin-wrk
Input PDB file containing an initial model.

KEYWORDED INPUT

See Note on keyword input.

-colin-ref-fo colpath

Observed F and sigma for reference structure. See Note on column paths.

-colin-ref-hl colpath

Hendrickson-Lattman coefficients for reference structure. If you do not have these, they can be generated using the accompanying chltofom program. See Note on column paths.

-colin-wrk-fo colpath

Observed F and sigma for work structure. See Note on column paths.

-colin-wrk-hl colpath

[Optional] Hendrickson-Lattman coefficients for work structure. See Note on column paths.

-resolution resolution/A

[Optional] Resolution limit for the calculation. All data is truncated.

-side-chain-omit

[Optional] Calculate a side-chain omit map, removing the side chains from the map calculation, and thus reducing bias to the input data. For good data this shouldn't matter. For poor data, the phasing may be too poor without the side chain atoms. If in doubt, try both.

-correlation-mode

[Optional] Use the correlation target function for growing new chains and for sequencing. This is less effective for initial building, but better for model completion, especially after molecular replacement. If in doubt, try both.

-verbose verbosity

Note on column paths:

When using the command line, MTZ columns are described as groups using a slash separated format including the crystal and dataset name. If your data was generated by another column-group using program, you can just specify the name of the group, for example '/native/peak/Fobs'. You can wildcard the crystal and dataset if the file does not contain any duplicate labels, e.g. '/*/*/Fobs'. You can also access individual non-grouped columns from existing files by giving a comma-separated list of names in square brackets, e.g. '/*/*/[FP,SIGFP]'.

Note on keyword input:

Keywords may appear on the command line, or by specifying the '-stdin' flag, on standard input. In the latter case, one keyword is given per line and the '-' is optional, and the rest of the line is the argument of that keyword if required, so quoting is not used in this case.

Reading the Output:

The program outputs a list of protein chains, along with the original and suggested sequences. For each chain, the original sequence is printed, along with a modified sequence, modified to give the best match to the electron density. In the modified sequence, there may be '?'s for residues which cannot be reliably sequenced, and '+'s and '-'s to indicate possible insertions and deletions.

It is normal for the endmost residues of a chain to be converted to '?'s. Loops with poor density sometimes give rise to short register shifts with low confidence. If you get a longer region (10 residues or more) bracketed by one or more '+' and '-' symbols, and the confidence score is greater than 0.95, then there may be a problem.

Problems:

AUTHOR

Kevin Cowtan, York.

SEE ALSO