sequins - Statistical protein chain tracing
'sequins' performs sequence validation by comparing the model side chains against the electron density. It may be run with phases from experimental phasing, or it can calculate its own phases using a side-chain-omit process. In this case it can be used after molecular replacement, or to validate structures in the PDB.
A set of reference structure will have been provided with the program. The structure 1TQW is good for typical protein problems at resolutions up to 1.25A, although in practice including data much beyond 2.0A doesn't make much difference. For exotic cases you might want to provide your own reference structures.
Observed F and sigma for reference structure. See Note on column paths.
Hendrickson-Lattman coefficients for reference structure. If you do not have these, they can be generated using the accompanying chltofom program. See Note on column paths.
Observed F and sigma for work structure. See Note on column paths.
[Optional] Hendrickson-Lattman coefficients for work structure. See Note on column paths.
[Optional] Resolution limit for the calculation. All data is truncated.
[Optional] Calculate a side-chain omit map, removing the side chains from the map calculation, and thus reducing bias to the input data. For good data this shouldn't matter. For poor data, the phasing may be too poor without the side chain atoms. If in doubt, try both.
[Optional] Use the correlation target function for growing new chains and for sequencing. This is less effective for initial building, but better for model completion, especially after molecular replacement. If in doubt, try both.
When using the command line, MTZ columns are described as groups using a slash separated format including the crystal and dataset name. If your data was generated by another column-group using program, you can just specify the name of the group, for example '/native/peak/Fobs'. You can wildcard the crystal and dataset if the file does not contain any duplicate labels, e.g. '/*/*/Fobs'. You can also access individual non-grouped columns from existing files by giving a comma-separated list of names in square brackets, e.g. '/*/*/[FP,SIGFP]'.
Keywords may appear on the command line, or by specifying the '-stdin' flag, on standard input. In the latter case, one keyword is given per line and the '-' is optional, and the rest of the line is the argument of that keyword if required, so quoting is not used in this case.
The program outputs a list of protein chains, along with the original and suggested sequences. For each chain, the original sequence is printed, along with a modified sequence, modified to give the best match to the electron density. In the modified sequence, there may be '?'s for residues which cannot be reliably sequenced, and '+'s and '-'s to indicate possible insertions and deletions.
It is normal for the endmost residues of a chain to be converted to '?'s. Loops with poor density sometimes give rise to short register shifts with low confidence. If you get a longer region (10 residues or more) bracketed by one or more '+' and '-' symbols, and the confidence score is greater than 0.95, then there may be a problem.
Kevin Cowtan, York.