SEQWT (CCP4: Supported Program)

NAME

seqwt - Calculate molecular weight of protein/DNA/RNA from sequence

SYNOPSIS

seqwt [ SEQUENCE foo_prot.seq | DNASEQUENCE foo_dna.seq | RNASEQUENCE foo_rna.seq ]

DESCRIPTION

Given a protein sequence in a file foo_prot.seq, SEQWT will calculate an estimated molecular weight (in Daltons) for each chain and for the protein as a whole. Note that "residues" with ids such as W or WAT or HOH are ignored and excluded from the calculation.

If a file DNASEQUENCE or RNASEQUENCE is provided instead, then SEQWT will calculate an estimated molecular weight (in Daltons) for the appropriate nucleic acid. It is currently not possible to input a protein/nucleic acid complex.

The program takes no keyworded input. The mode is determined solely by the command line input file.

INPUT AND OUTPUT FILES

Input

SEQUENCE
A file containing the protein sequence in one of the accepted sequence formats described below.
DNASEQUENCE
A file containing the DNA sequence in one of the accepted sequence formats described below. 1-letter codes are assumed to be 'G','C','A','T', while 3-letter codes are ' DG',' DC',' DA',' DT'.
RNASEQUENCE
A file containing the RNA sequence in one of the accepted sequence formats described below. 1-letter codes are assumed to be 'G','C','A','U', while 3-letter codes are ' G',' C',' A',' U'.

Output

No output files are produced.

Sequence file formats

The SEQUENCE file can be in these formats:

The *.pir type, e.g:

 x.seq
 > chain 1
  MTSKIEQPRW ASKDSAAGAA STPDEKIVLE FMDALTSNDA AKLIEYFAED TMYQNMPLPP
  AYGRDAVEQT LAGLFTVMSI DAVETFHIGS SNGLVYTERV DVLRALPTGK SYNLSILGVF
  QLTEGKITGW RDYFDLREFE EAVDLPLRG
 
 > chain 2
 
  MTSKIEQPRW ASKDSAAGAA STPDEKIVLE FMDALTSNDA AKLIEYFAED TMYQNMPLPP
  AYGRDAVEQT LAGLFTVMSI DAVETFHIGS SNGLVYTERV DVLRALPTGK SYNLSILGVF
 QLTEGKITGW RDYFDLREFE EAVDLPLRG

Or just the sequence list:

x.seq
  MTSKIEQPRW ASKDSAAGAA STPDEKIVLE FMDALTSNDA AKLIEYFAED TMYQNMPLPP
  AYGRDAVEQT LAGLFTVMSI DAVETFHIGS SNGLVYTERV DVLRALPTGK SYNLSILGVF
  QLTEGKITGW RDYFDLREFE EAVDLPLRG

Or the SEQRES records in the PDB format

 seqres.seq
SEQRES   1 A   21  GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU  4INS 170
SEQRES   2 A   21  TYR GLN LEU GLU ASN TYR CYS ASN                      4INS 171
SEQRES   1 B   30  PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU  4INS 172
SEQRES   2 B   30  ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR  4INS 173
SEQRES   3 B   30  THR PRO LYS ALA                                      4INS 174
SEQRES   1 C   21  GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU  4INS 175
SEQRES   2 C   21  TYR GLN LEU GLU ASN TYR CYS ASN                      4INS 176
SEQRES   1 D   30  PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU  4INS 177
SEQRES   2 D   30  ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR  4INS 178
SEQRES   3 D   30  THR PRO LYS ALA                                      4INS 179

PRINTER OUTPUT

The program produces a list of residues found in each chain in the sequence and the estimated molecular weight for that chain. At the end of the run it writes out the total estimated molecular weight.

EXAMPLES

SEQWT can be run on the distributed rnase.pir file:

seqwt SEQUENCE $CEXAM/rnase/rnase.pir

The program estimates that the molecular weight is 10540 Daltons (compared with the official value of 10576 Daltons).

AUTHOR

Eleanor Dodson
DNA/RNA additions: Martyn Winn

SEE ALSO