MOLREP (CCP4: Supported Program)

NAME

molrep - automated program for molecular replacement

SYNOPSIS

molrep [HKLIN in.mtz] [MAPIN EM_map.ccp4] [MODEL in.pdb ( or EM_mod_map.ccp4)] [MODEL2 in2.pdb] [PATH_OUT path_out] [PATH_SCR path_scr]
[Keyworded input]

DESCRIPTION

Version 9.2 /19.06.2005/

FEATURES


CONTENTS


REFERENCES

      
   Author:  A.A.Vagin
                email: alexei@ysbl.york.ac.uk

   References:     A.A.Vagin, New translation and packing functions.,
                   Newsletter on protein crystallography., Daresbury
                   Laboratory, (1989) 24, pp 117-121.

                   Alexei Vagin and Alexei Teplyakov.
                   An approach to multi-copy search in molecular replacement., 
                   Acta Cryst.D,(2000) 56, pp 1622-1624 

                   A.A.Vagin and M.N.Isupov
                   Spherically averaged phased translation function and 
                   its application to the search for molecules and fragments 
                   in electron-density maps
                   Acta Cryst.D,(2001) 57, pp 1451-1456


            main:  A.Vagin,A.Teplyakov, MOLREP: an automated program for
                   molecular replacement.,
                   J. Appl. Cryst. (1997) 30, 1022-1025.

INSTALLATION

Copy file molrep.tar.gz

and uncompress it (`gunzip molrep.tar.gz')

After untaring `molrep.tar' (command: tar xvf molrep.tar) you will get a molrep directory, with src, doc, data and bin subdirectory and README file. To build the executable, go to src.

1. setenv BLANC_FORT
define compiler with options,for example:
for linux and mac:
setenv BLANC_FORT "f77 -fno-globals -fno-automatic -O1 -w"
for linux intel compiler:
setenv BLANC_FORT "ifort -static -O1"
for mac ibm compiler:
setenv BLANC_FORT " xlf -qextname -qarch=auto -qtune=auto -qstrict -O3"
else
setenv BLANC_FORT "f77 -O1"
2. molrep.setup
the executable (molrep) will finish up in the bin directory; providing the full pathname (.../molrep/bin/molrep) one can execute it from anywhere without having to define an environmental variable. CCP4 version (which can read MTZ file ) will be prepared automaticly if ccp4 is installed
Or if you like:
1. set MR_LIBRARY = '/ccp4-5.2/lib/libccp4f.a /ccp4-5.2/lib/libccp4c.a'
define libraries (without ccp4: set MR_LIBRARY = 'molrep_dummy.o')
2. set MR_FORT = ( f77 -O2 )
define compiler with options
3.make all MR_LIBRARY="$MR_LIBRARY" MR_FORT="$MR_FORT"
molrep for Fortran 90 with memory allocation:
in main_molrep_mtz.f:
comment line:
REAL POOL(MEMORY)
uncomment lines:
C REAL, ALLOCATABLE :: POOL(:)
C VERS = 'M'
C IF(.NOT.ALLOCATED(POOL)) ALLOCATE(POOL(MEMORY_IN),STAT=ISTAT)
C DEALLOCATE(POOL)
for ccp4 version, CCP4 must be prepared with the same compiler

Also you can download binaries (executable files):

(all with memory allocation option)

molrep_sgi.gz
molrep_alpha.gz
molrep_linux.gz
molrep_mac.gz

four files for CCP4i:

molrep.com.gz
molrep.def.gz
molrep.script.gz
molrep.tcl.gz

New style to use and new keywords


You can use this version as previous one:

1. by command (batch) file
2. interactively
3. by ccp4i

New style to use:

     You can use program by command string with options:
  
   molrep -f file_sf_or_map -m  model_crd_or_map
          -mx fixed model   -m2 model_2
          -po path_out      -ps path_scrath
          -s file_sequence  -s2  file_seq_for_m2
          -k file_keywords  -doc y_or_a_or_n
          -mem memory       -na natom_max
          -h   -i  -r
  
           h = only keyword and mtz label information, clean
           i = interactive mode
           r = rest some special files
           file_keywords = simple text file with keywords
                           (one line - one keyword)
           memory    = memory request in Mb (for f90 only)
           natom_max = maximal number of atoms in the model
  
       Examples:

       Without any keywords:
  
       Usual MR: RF + TF
   molrep -f file.mtz -m model.pdb
  
       Usual MR with fixed model
   molrep -f file.mtz -m model.pdb -mx mfix.pdb
  
       Usual MR with sequence
   molrep -f file.mtz -m model.pdb -s file_seq
  
       Self rotation funtion
   molrep -f file.mtz
  
       multi-copy search, one model
   molrep -f file.mtz -m model.pdb -m2 model.pdb
  
       multi-copy search, two different models
   molrep -f file.mtz -m model1.pdb -m2 model2.pdb
  
       Fitting two atomic models
   molrep -m model1.pdb -mx model2.pdb
  
       Rigid body refinement
   molrep -f file.mtz  -mx model.pdb

       Get information about keywords and mtz labels, and clean
   molrep -h -f file.mtz

       If you like to play with keywords:
  
       Using keywords from file
   molrep -f fobs.pdb -m model.pdb -k file_keywords
  
       Using keywords interactivly
   molrep -f file.mtz  -m model.pdb -i
  
  

New keywords:

Use "molrep -h" to get short manual of MOLREP.

SCORE N means do not stop if contrast is good and do not assess solution. Scoring system is working well if expected number of models and proper symmetry of model are correct. Proper symmetry of model program defines by model Self Rotation Function when Cross Rotation function is computed.
If you like or if you use keyword FUN = T you can define Proper symmetry of model by keyword NCSM.
C use Correlation Coeffitient instead of Score and do not stop.
NCSM number of subunits in the model (only for scoring)
PST new value A means to check all pseudo-translation vectors automatically (default now).
DIFF new value M means to remove fixed model from map (Fobs,PHobs) by mask for RF or PRF.
SG space group name. You can use this keyword instead NOSG
ALL means to check all possible space groups automatically.
RESMIN soft resolution cut_off, use it instead RES_T and RES_R
SEQ D,Y,N - D default, N means to use identity as SIM only (without model correction)
SURF new value C - as Y but new B only for Packing function, not change original B

INPUT/OUTPUT FILES

Input file formats

  1. Format of input file of models
    1. for atomic model - PDB or CIF or BLANC.
    2. for EM or electron density map - CCP4 or BLANC.
      CCP4 file must have extension "ccp4"
  2. Input file of structure factors and phases.
    1. Input formatted file of structure factors (CIF).
      This file must be CIF file which contains indices, structure factors (and phases if you need them): h,k,l,|F|,sig(F),Phi or h,k,l,|F|,sig(F),Phi,Fom
    2. Input PDB file of structure factors.
      This file contains indices and structure factors or intensities. (also simple formatted file with h,k,l,|F|,sig(F) or h,k,l,|F| and without titles is acceptable)
    3. Alternative input file of structure factors or phases is unformatted file of BLANC suite.
    4. MTZ file, which must have extension "mtz".
    5. EM or electron density file (CCP4 file must have extension "ccp4").
      This file will be converted to files of !F! and phases.

You can find some examples in Input file examples.

Space group and unit cell parameters of the unknown structure will be taken from the file of structure factors but if this file does not have such information, it will be taken from the coordinate file. In this case the input PDB coordinate file must contain correct CRYST1 cards with unit cell parameters and space group name for the unknown structure. You can change the space group of the structure factor file by using keyword NOSG.

Output files

molrep.crd
new coordinates of model (plus model_2) corresponding to the best solution of Cross Rotation and Translation Function.
molrep.pdb
new PDB coordinate file with molrep solution (this file will be created if you start from PDB file).
molrep_fcalc.dat
molrep_phcalc.dat
!F! and phases of molrep solution (plus from model_2). These files will be created if your model is EM or density. (BLANC format)
molrep_fcalc.cif
formatted CIFile of molrep solution (plus from model_2) with Fobs, Fcalc, Phcalc (this files will be created if your model is EM or density in CCP4 format).
molrep.doc
protocol (like log file in CCP4). This file will be created if keyword DOC is 'Y' or 'A'.
molrep.bat
command (batch) file. This file will be created if keyword DOC is 'Y' or 'A'. You can repeat calculation using this file: cp molrep.bat bat and sh bat
molrep_rf.tab
List of peaks of rotation function, created as soon as the program calculates a Rotation Function.
molrep_rf.tab is default name, you can use another name using keyword FILE_T.
You can edit this file and use it for subsequent calculations without computing the rotation function again (keyword FUN=T). The program then reads (in free format!): "Sol_", peak number and Polar angles (theta,phi,chi), e.g.:
"Sol_  23  10.0    22.2 40.0"
In Phased Rotation Function calculations, the program reads "Sol_", peak number, Polar angles and shift (sx,sy,sz), e.g.:
"Sol_  23  10.0    22.2 40.0  .564 .443 .032"
In Rotation and Position the model (keyword FUN=S), the program reads "Sol_", peak number, Polar angles and shift (sx,sy,sz) e.g.:
"Sol_  23  10.0    22.2 40.0  .564 .443 .032"
If you like to use Euler angles use "Sol_A" instead "Sol_"
In 'Search model orientation by PRF' (keyword PRF=P), the program reads "Sol_", peak number, Polar angles and shift (sx,sy,sz) e.g.:
"Sol_  23  10.0    22.2 40.0  .564 .443 .032"
But program will use only the shift (sx,sy,sz).
molrep_srf.tab
List of peaks of Self Rotation Function.
molrep_srf.tab is default name, you can use another name using keyword FILE_TSR.
molrep_rf.ps
PostScript file of Self Rotation Function

Some output files will not be deleted if keyword LIST=L.They have the internal format of the BLANC program suite and can be used by programs of this suite (see file README from ftp site: ftp.ysbl.york.ac.uk (user: anonymous, then: cd pub/alexei)

crossrot_alo.dat
spherical coefficients of F_observed
crossrot_alm.dat
spherical coefficients of F_model
crossrot_dns.dat
rotation function map
molrep_fob.dat
F_observed

also (if you started from MTZ file):

molrep_mtz.cif
formatted CIF file of F_observed

also (if you used keyword FILE_S):

align.pdb
input model corrected by sequence alignment

See also How to redirect output and scratch files

WHAT MOLREP CAN DO


                          +-- Self RF  (FUN=R, without any model)
                          !
         +-- Standard MR -+-- Cross RF (FUN=A or FUN=R )
         !                !
         !                +-- Locked Cross RF ( FUN=A or R and LOCK=Y )
         !                !
         !                +-- TF       (FUN=A or FUN=T )
         !
         !
         !                                       +- two identical models
         !                                       !
         !                      +-- Dyad search -+
         !                      !   (DYAD=D)     !
         +-- Multi-copy search -+                +- two different
         !      for MR          !                   models 
         !                      !
MOLREP --+                      +-- Multi-copy for one model
         !                          (DYAD=Y)
         !                        
         !                        
         !                       +-- RF and PTF
         !                       !   (PRF=N)
         !                       !
         +-- Fitting two models -+-- SAPTF, RF and PTF
         !                       !   (PRF=S)
         !                       !
         !                       +-- SAPTF, PRF and PTF
         !                           (PRF=Y)
         !
         !
         !                        +-- RF and PTF
         !                        !   (PRF=N)
         !                        !
         +-- Searching in ED map -+-- SAPTF, RF and PTF 
         !                        !   (PRF=S)
         !                        !
         !                        +-- SAPTF, PRF and PTF
         !                            (PRF=Y)
         !
         +-- Rotate and position the model (FUN=S FILE_T)
         !
         !
         !
         +-- Search model orientation in electron density map 
         !   for particular position by phased RF (PRF=P FILE_T2)
         !
         !
         !                +-- find HA positions by MR solution 
         !                !   (FUN=D, model_2)
         !                !
         +-- HA search ---+-- HA search for SIR or SAD   
         !                !   (DIFF=H, FUN=T, without any model)
         !                !
         !                +-- Self RF for HA position 
         !                    (DIFF=H, FUN=R, without any model)
         !
         +-- pure RB refinement (phased or unphased, FUN=B, model_2)
             


   where: FUN, DYAD, PRF, LOCK, DIFF - keyword
          MR    - Molecular Replacement
          RF    - Rotation function
          TF    - Translation function     
          PRF   - Phased Rotation function
          PTF   - Phased Translation function
          SAPTF - Spherically Averaged Phased Translation function
          ED    - Electron density
          HA    - heavy atom
          RB    - rigid body

* Standard molecular replacement method

The program performs molecular replacement in two steps:

  1. Rotation function (RF)
    search orientation of model
  2. Cross Translation function (TF) and Packing function (PF)
    search position of oriented model

The result of the Rotation function depends on the radius of a spherical domain in the centre of the Patterson function (the so-called cut-off radius). This radius must be chosen so as to maximize the ratio between the number of inter- and intramolecular vectors. The program chooses the value of this radius as twice the radius of gyration, but can also use an input value (keyword RAD).

Instead of computing RF, the program can use a list of orientations from a Rotation function (keyword FILE_T) which was prepared before. Anisotropic correction of data before computing RF can be useful for data with high anisotropy (keyword ANISO).

With a second fixed model (MODEL_2), the use of modified structure factors instead of |Fobs| for RF (keywords DIFF,P2) may make RF clearer. The modified structure factor is:

||Fobs|-|Fmod2|*(P2/100)|

where P2 is the percentage of model_2 in the whole structure.

The Translation function can check several peaks of the rotation function (NP) by computing a correlation coefficient for each peak and sorting the result. For scaling observed and calculated structure factors, the program uses the scaling by the origin peak of Patterson, but for data with high anisotropy the program can use anisotropic scaling (ANISO). The Translation function can take into account the second fixed model (MODEL_2) and also, if the number of monomers is known, MOLREP can position the input number of monomers in a simple run (keyword NMON). Also in this case the possibility to choose from symmetry-related models closest to which was found before is useful (keyword STICK).

The program can detect and use pseudo-translation vectors. In this case the pseudo-translation related copy will be added to the final model (keyword PST).

The Packing function is very important in removing wrong solutions which correspond to overlapping symmetry-related or different models (keyword PACK).

Be careful, with keyword SURF='Y' program use to calculate Packing Function only atoms which lying inside of molecule. So, for nonglobular model (for example, only CA atoms) you cannot use keyword SURF='Y' with PACK='Y'.

Use keywords:

COMPL, DIFF, FUN, MODEL_2, NMON, NP, NPT, P2, PACK, PST, RAD, RESMAX, SIM, STICK, SURF, VPST, NREF, NREFP, FILE_T, FILE_TSR, NSRF

* Self Rotation Function only

If you define only a file of structure factors (Fobs), the program will compute a Self Rotation function with cut-off radius RAD = 30 as default. Use keyword RAD if you want another value. Other useful keywords: RESMAX, RES_R, COMPL, SIM.

Resulting output:

molrep_srf.tab
List of peaks of Self Rotation Function.
molrep_srf.tab is default name, you can use another name using keyword FILE_TSR.
molrep_rf.ps
PostScript file of Self Rotation Function which contains four plots RF (theta,phi,chi) for
chi = 180, 90, 120, 60.
You can change the fourth value of chi (60) by keyword CHI
You can change the scale of these plots of RF by keyword SCALE

* Search model in electron density or EM map

In some cases it is difficult to solve an X-ray structure by molecular replacement even when a structure for a homologous molecule is known. If prior phase information either from SIR/MAD or from a partial structure is known, this could be used in a six-dimensional search. The program divides the six-dimensional search with phases into three steps:

  1. a spherically averaged phased translation function (SAPTF) is used to locate the position of the molecule or its fragment. It compares locally spherically averaged experimental electron density with that calculated from the model and tabulates highest scoring positions.
  2. then for each such position a local phased rotation function (PRF) is used to find the orientation of the molecule.
    Another possibility is to use usual rotation function (RF) for modified map, i.e program sets 0 the density outside of sphere with radius = twice radius of model and with the centre in current point.
  3. the phased translation function (PTF) for found orientation which checks and refines the found position.

You need to have the phases in a CIF file of structure factors or to use corresponding keywords for MTZ file or use EM map as input instead of Fobs file. In EM case map will be converted into !F! and phases.

For input map use keywords:

DSCALE, INVER, DLIM

If you structure contains several molecules which forms some point group you can use this NCS. Program will generate complete model and use it for stage PTF. Also result (molrep.pdb) will be complete model.

First of all you must define parameters of NCS in PDB file or using keywords NCS, ANGLES, CENTRE. See How to define NCS

Other useful keywords:

COMPL, NMON, NP, NPT, RAD, RESMAX, SIM, SURF, INVER, NCS, ANGLES, CENTRE

Also you can refine solution by Pure Rigid Body Refinement

* Search model orientation in electron density map for particular position by PRF

You can use this possibility (keywords PRF=P and FUN=R or A) if you want to find the model orientation in ED map by rotating model around the defined point in ED map. Program puts the origin of model coordinate system to the defined point and performs phased rotation function (PRF). Use keyword RAD to define the radius of sphere for PRF.

You must define the list of defined points of ED map using file FILE_T2 , which must contain lines with "Sol_", peak number, Polar angles and shift (sx,sy,sz) e.g.:

"Sol_  23  10.0    22.2 40.0  .564 .443 .032"
But program will use only the shift (sx,sy,sz).

Model is rotated around the origin of model coordinate system. If keyword SURF= Y,A,2,O program puts the centre of model to the origin of model coordinate system automatically. If you want, for example, to rotate the model around some atom, shift the origin to this atom and use SURF=N

Other useful keywords:

COMPL, NMON, NP, NPT, RESMAX, SIM, INVER

* Fitting two models (FM)

The idea is to fit the electron densities instead of the atomic models, trying to find the best overlap. Advantages are:

If you define only two files of models (searching model and model_2), without a file of structure factors (Fobs), the program will fit the search model (keyword FILE_M) to the second model (keyword MODEL_2). The search model must be smaller or equal to the second model.

Other useful keywords:

COMPL, NP, NPT, RAD, RESMAX, SIM, SURF

The result is file molrep.crd (or molrep.pdb) - model fitted to second model.

* Just rotate and position the model

This possibility may be useful if you want to place the model to a particular orientation and position, or to compare several solutions.

Use keyword FUN=S and define three files: a model (keyword FILE_M), a file of structure factors (keyword FILE_F) and file with polar angles and shifts (keyword FILE_T). The program will shift the model to the origin, rotate (by polar angles) and the position it (in fractional units). The new model will be written to an output coordinate file. Also the program will compute an R-factor and a Correlation Coefficient.

Other useful keywords:

COMPL, RESMAX, RES_T, SIM

* Multi-copy search

There are two modes: "dyad_search" and "Multi-copy search".

Dyad_search - Search two copies of a model simultaneously (keyword DYAD=D).

Sometimes you can not find a solution starting with one molecule if you have several copies of the molecule in the asymmetrical part of the unit cell. In this case a search with two independent molecules may give a solution. The central point of method is the construction of a multi-copy search model from properly oriented monomers using a special TF (STF), which gives the intermolecular vector between properly oriented monomers (dyad). This dyad can then be used for a positional search with a conventional TF.

  1. the program checks all pairs of NP peaks of the Rotation Function (RF). For each pair the program uses the first rotation to prepare model-1. Model-2 will be prepared by using the second rotation and one rotation from the crystallographic symmetry operators. The total number of pairs to be checked is ((NP+1)*NP*Nsym)/2
  2. next, for model-1 and model-2: the program computes the Special Translation Function ( STF) to find the inter-molecular vector of the dyad.
  3. for NPT peaks (i.e. inter-molecular vectors) of the STF, the program computes the standard Translation Function (TF) using the current dyad as a model, and it calculates a Correlation Coefficient for the firstNPTD peaks of the TF.

Solution and output file: molrep.pdb will be the dyad with the best Correlation Coefficient (or several dyads if keyword NMON > 1).

WARNING: the procedure takes quite some time, because the total number of Translation Functions to be calculated is NMON*NPT*((NP+1)*NP*Nsym)/2.

In the output .log (.doc) file you can find the following information:


Sol_      R1  R2  Rs Rslf STF TF        Shift_1     PFmax PFmin    Rfac   Corr
Sol_       1   1   1   0   2   1  0.059 0.000 0.201  1.01  0.99   0.569  0.379

and

Sol_best   1   1   1   0   2   1  0.059 0.000 0.201  1.01  0.99   0.569  0.379
Sol_best         Rot1-->2               Dyad_vector       dist d_ort d_par
Sol_best    0.0    0.0    0.0    -0.210  0.000 -0.487    39.2  19.6  33.9

These lines means:

R1
peak number of rotation for model-1
R2
peak number of rotation for model-2
Rs
CS operator number which applied before rotation for model-2
Rslf
peak number of self rotation function
STF
peak number of special translation function
TF
peak number of translation function
Shift_1
position of model-1
PFmax PFmin
min, max values of Packing function
Rfac Corr
R-factor and Correlation Coefficient
Rot1->2
polar angles of rotation from model-1 to model-2.
Dyad_vector
vector (in fractional) from model-1 to model-2.
dist d_ort d_part
first number - distance between models (in Angstrom)
second number - distance orthogonal to rotation 1->2
third number - distance parallel to rotation, i.e. for pure dimer this is 0.

With keyword LIST=L you can find additional information:

Sol_              angles_1             angles_2        shift_2
Sol_      90.63   98.70  118.12   90.63   98.70  118.12  0.189  0.256 -0.415
 
       +---------------------------------------------------------+ 
       !                                                         !
       !                                                         !
       !                                                         !
       !                                                         !
       !                                                         !
       !      -----------------            -----------------     !
       !     /                 \          /                 \    !
       !    /                   \        /                   \   !
       !    ! rotated (angles_1) !       ! rotated (angles_2) !  !
       !    !     monomer_1      !       !     monomer_2      !  !
       !    !                    ! dyad  !                    !  !
       !    !         +----------!-----------------+          !  !
       !    !        /           ! vector!       '            !  !
       !    !       /           /         \  '               /   !
       !    \      /           /           \                /    ! 
       !     \    /           /          '  \              /     !
       !      ---/------------        '      --------------      !
       !        /shift_1           ' shift_2                     !
       !       /                '                                !
       !      /              '                                   !
       !     /            '                                      !
       !    /          '                                         !
       !   /        '                                            !
       !  /      '                                               !
       ! /    '                                                  !
       !/  '                                                     !
       +---------------------------------------------------------+
         origin

If you believe the Self-RF, you can try to find a dyad which has the rotation between monomers corresponding to the rotation of the Self-RF (use keywords NSRF,FILE_TSR).

Model-2 can be different from model-1. Use keywords FILE_M2 to define file of searching model-2, FILE_T2 with list of peaks rotation function for this model (this RF have to be computed before) and NP2 number of peaks which will be used.

Multi-copy search - Search many copies of a model (not only dyad) (keyword DYAD=Y). Program starts to search a single monomer, after that produces the dyad search, repeats dyad search for next dyad with the first being fixed and, finally, tries to add a single monomer.

Use keywords:

DYAD, DIST, NP, NSRF, NPT, NPTD, NP2, AXIS, FILE_M2, FILE_T2, FILE_T, FILE_TSR, NMON, ALL, PACK

and also:

COMPL, SIM, RESMAX, SURF, STICK

* Model correction

You can improve your model beforehand by using keyword SURF.

N do not perform any model correction.For FUN=S (just_rotate_and_position) program changes N to O
O only shift to the origin
A make the protein into a polyalanine model (i.e. remove from the model: water molecules, H atoms, atoms with alternative conformation (except the first), atoms with occupancy = 0), make all B = 20, and shift to the origin
Y remove various atoms from the model (water molecules, H atoms, atoms with alternative conformation (except the first), atoms with occupancy = 0), shift to the origin, compute atomic accessible surface area and replace atomic B with B = 15.0 + SURFACE_AREA*10.0
2 set all B = 20 and shift to the origin
C as Y but new B only for Packing function (not change original B) and shift to the origin

* Using sequence alignment

Another way to improve your model is to use the sequence of the unknown structure.

Use keyword FILE_S to define a file containing a sequence. This sequence file must be ASCII:

 
!> title
!# sequence 
!SVIGSDDRTRVTNTTAYPYRAIVHISSSIGSCTGWMIGPKTVATAGHCIY
!# this is comment
!    DTSSG--SFAGTATVSP   GRNGTSYPYG
!NRGTRITKEVFDNLTNWKNSAQ
!

If the first symbol in the line is "#", it means the line contains comments. Blanks are ignored.

The program will perform sequence alignment and create a new corrected model with the atoms corresponding to the alignment. The output file with the corrected model is align.pdb. The results of the alignment are written to the DOC-file, if this was defined. Without an Fobs file, the program only performs model correction.

* NMR Model

You can use PDB file with NMR models or pseudo-NMR file with several homologous structures which were superimposed before. Algorithm is equivalent to sum RF or/and TF for individual structures. Program can find the best model in NMR file or use all models (see keyword NMR) .

In the PDB file different models must be separated by MODEL record. For example:

HEADER    HYDROLASE (ENDORIBONUCLEASE)         
CRYST1   64.900   78.320   38.790  90.00  90.00 ...
MODEL        1 
ATOM      1  N   ASP A   1      45.161  12.836 ... 
ATOM      2  CA  ASP A   1      45.220  12.435 ...   
 ... 
ATOM    745  SG  CYS A  96      58.398   6.673 ... 
ATOM    746  OXT CYS A  96      62.238   7.178 ...  
ENDMDL                                        
MODEL        2   
ATOM      1  N   ASP B   1      44.487  11.386 ...  
ATOM      2  CA  ASP B   1      44.559  11.129 ... 
 ...

Use keyword NMR

* EM or electron density model

Searching model can be Electron Microscopic model (EM) or electron density map. Only values higher the limit (if keyword ROLIM is defined) will be used. Map must have space group P1 and contains whole model. Vector ORIGIN defines the centre of model and the rotation will be performed around this point. If parameter DRAD (radius of model) is defined program will use the density only inside the sphere with radius = DRAD and with centre in vector ORIGIN.


        +--------------------------------+ nz
   !    !                                !
   !    .                                .
   !
   !    !                                !
   !    .                                .
   !
   !    !                                !
   !    +--------------------------------+ izmax
   !    !                                !
   !    !                                !
   !    !                                !
   !    !      ----------------          !
   !    !     /                \         !
   !    !    /                  \        !
   !    !   /                    \       !
 C_cell !  /                      \      !
   !    !  !                       !     !
   !    !  !   DRAD                !     !
   !    !  !---------- +           !     !
   !    !  !          / centre     !     !
   !    !  !         /             /     !
   !    !  \        /             /      !
   !    !   \      /             /       !
   !    !    \    /             /        !
   !    !     \  /             /         !
   !    !      -/--------------          !
   !    !      /                         !
   !    !     /                          !
   !    !    / ORIGIN                    !
   !    !   /                            !
   !    !  /                             !
   !    ! /                              !
   !    !/                               !
   !    +--------------------------------+
        0                                nx
        ----------- A_cell --------------
  

Program will get vector ORIGIN from file automatically. If it is not possible to get correct vector, program will use ORIGIN = ( 0.5, 0.5, izmax/(2*nz)). If you want you can define ORIGIN yourself.

Use keywords:

DSCALEM, INVERM, ROLIM, DRAD, ORIGIN
Search model in electron density map will be performed as usual.

* Locked Cross Rotation Function

Locked Cross Rotation function (LRF) means to average the Cross Rotation function by NCS which can be determined with Self Rotation function. LRF is especially useful when NCS forms a group.

Use keywords:

LOCK, NSRF, FILE_TSR,

* Rigid body refinement

1) Rigid body refinement with TF

If keyword MODE = S program produces Rigid Body refinement for each peak of TF. Number of cycles is controlled by keyword NREF (default 10). Also program can refine the orientation given by RF before TF. In this case program produces Rigid Body refinement (in space group P1) for each peak of RF. Number of cycles is controlled by keyword NREFP. Default value is 0, i.e. without this refinement.

Use keywords:

MODE, NREF, NREFP

If your model contains several domains you can use multi-domain Rigid body refinement. For this you must put into PDB file additional lines before each domain. Additional line contains word '#DOMAIN' and domain number (free format).

For example:

HEADER    HYDROLASE (ENDORIBONUCLEASE)         
CRYST1   64.900   78.320   38.790  90.00  90.00 ...
#DOMAIN     1 
ATOM      1  N   ASP A   1      45.161  12.836 ... 
ATOM      2  CA  ASP A   1      45.220  12.435 ...   
 ... 
ATOM    745  SG  CYS A  96      58.398   6.673 ... 
ATOM    746  O   CYS A  96      62.238   7.178 ...  
#DOMAIN     2 
ATOM    747  N   PHE A  97      44.487  11.386 ...  
ATOM    748  CA  PHE A  97      44.559  11.129 ... 
 ...
ATOM    945  C   VAL A 196      58.398   6.673 ... 
ATOM    946  O   VAL A 196      62.238   7.178 ...  
#DOMAIN     1 
ATOM    947  N   ASP A 197      44.487  11.386 ...  
ATOM    948  CA  ASP A 197      44.559  11.129 ... 
 ...

2) Pure rigid body refinement

Also you can use Pure Rigid Body Refinement in Patterson or Real space (keyword FUN = 'B'). This possibility is useful in the last stage of MR. For example after fitting the model into EM map. You must define Fobs or Fobs and phases (or Map). Also use keyword MODEL_2 for model to refine. If you fefine the phases or use the map program will produce real space refinement (more correctly, in reciprocal space using phase information).

If keyword DOM = 'N' (default) program refines MODEL_2 as single molecule.

If keyword DOM = 'y' program performs multi-domain refinement. Domain structure must be defined in PDB file (see above).

If you structure contains several molecules which forms some point group you can use this NCS as constraints. First of all you must define NCS parameters in PDB file or using keywords: NCS,ANGLES, CENTRE. See How to define NCS

If you have some trouble with NCS parameters (values: theta,phi,chi,cx,cy,cz) use keyword DOM = 'I' and you can find these actual values in output PDB file: molrep.pdb. Alternative way to create initial PDB file with complete model is to describe only first (reference) molecule and use keyword DOM = 'C'. Complete model you can find these in output PDB file:molrep.pdb. Finally use keyword DOM = 'S' for refinement with constraints.

Use keywords:

FUN = B, DOM, COMPL, SIM, RESMAX, NREF, NCS, ANGLES, CENTRE

* Find HA positions by MR solution

To define derivative use corresponding label for MTZ file or derivative file (FILE_DER).

Use keywords:

FUN = D, MODEL_2 (as MR solution)

* Heavy atom search

To define derivative use corresponding label for MTZ file or derivative file (FILE_DER).

In this case you need not to use any model.

Use keywords:

DIFF = H, FUN = T or R

'FUN = T' means Heavy atom search (experimental version)
'FUN = R' means Self RF for Heave atom structure.

HOW TO USE MOLREP

A simple way to use MOLREP is to define files for Fobs and the model, number of model to search, and use default values for all parameters (i.e. without using any keywords). There is always a chance of solving the structure automatically. If this does not work, use a common strategy of molecular replacement.

Planning ahead

Success of the molecular replacement method depends on:

Things to look out for:

data
Look at your data quality. Completeness is very important. Absence of low resolution reflections may cause problems, especially if the model is some part of a whole structure. Look at anisotropy and twinning.
Think carefully: can you 'safely' use the high resolution reflections? If not, use keyword SIM to remove the potentially bad effect of this part of the data. It might be a good idea to use some program to check the data, for example SFCHECK.
model
Look at the model regarding the shape. The automatic choice of the cut-off radius for RF is twice the radius of gyration. This is good enough if the shape is approximately spherical. If the model is very asymmetrical, it is better to make a choice yourself.
Remove from your model the heavy atoms and some terminal residues if they lie 'outside' the model.
Make a choice for SIM,COMPL. If you have not any idea about similarity, SIM=0.5 is a good approximation.
If you have a dimer use it, but use RAD corresponding to a monomer.
It is very useful to shift the model to the origin of coordinates. Use keyword SURF = O or Y (Y is default).
Self-RF
Compute Self-RF. It may give you some idea about NCS or about the number of copies in the asymmetrical part of unit cell.
Choose the radius of integration carefully. The program can not make any informed choice about it without a model (default is 30Å).
Cross-RF only
Compute Cross-RF with LIST=L and DOC = Y. In the DOC_file you can find the list of expected orientations of the model and also the rotations between them. Compare this with the Self-RF. This is an additional check of correctness of the expected orientation. But sometimes we can not find corresponding peaks in Self-RF for correct orientation.
If you have high anisotropy in the data, use anisotropic correction.
Translation function
If there are several copies of the model in the asymmetrical part of unit cell, use keyword NMON or multi-copy or dyad search. You can not use the option of Pseudo-translation for a dyad search, since this can recognize Pseudo-translation itself.
If you have high anisotropy in the data, use anisotropic scaling.

Dialogue or batch

You can use MOLREP by dialogue or by command (batch) file. Modern computing technology allows the carrying out of most of the calculations for small and medium sized proteins in real time, therefore, dialogue is a preferable way of running MOLREP. Keywords with short explanations are printed by the program at the beginning of execution. However, the program automatically produces a batch command file during dialogue. This feature might be useful for repeated calculations.

Pseudo-translation

MOLREP can detect pseudo-translation, and define a pseudo-translation vector.

If keyword PST = Y, the program applies pseudo-translation with a pseudo-translation vector which was defined by the program or the user. When calculating a Translation Function, the program will use this vector to modify structure factors. Pseudo-translation copy will be added to the final model at the end program running.

If FUN=R and LIST=L MOLREP computes a list of Patterson peaks and writes these to molrep.doc. This may be helpful in the detection of pseudo-translation.

Use keywords:

PST, VPST

Flexible model

If your model is flexible, for example, consists of two domains, you can try to solve this problem by two ways:

1. Create two files for each domain and use dyad search (DYAD = D)

2. Combine these two domain files to single file with line "MODEL" between domains (like NMR file). Use usual Molecular Replacement methods with keyword NREFP or MODE = S and NREFP.

The use many homologous models

If you have several homologous models you can create a pseudo NMR file with these models and use its together (see NMR model). But these models must be superimposed before, for example, by MOLREP (see fitting two models).

Keep in mind

KEYWORDED INPUT

It is easy to use MOLREP interactively, but can be used in batch. All keywords must be preceded by an underscore (e.g. _DOC). The available keywords are:

These three keywords always must be defined in this order:

DOC
FILE_F
FILE_M

If you use MTZ file you have to put MTZ keywords after FILE_F and finish it with line "_END".

General keywords

Common:

FILE_T, FUN, NMON, NP, NPT, RAD PATH_SCR

And for structure factors control:

COMPL, RESMAX, SIM

And for model control:

MODEL_2, SURF

And for multi-copy search:

DYAD, FILE_M2, FILE_T2, NP2, NPTD, NSRF

And for search in ED:

PHASE, PRF, INVER

And for fitting two models:

PRF

And for EM or electron density model:

DSCALEM, INVERM, ROLIM, DRAD, ORIGIN

And for EM or electron density instead of Fobs:

DSCALE, INVER, DLIM

Keywords for special cases

Common:

ANISO, BADD, LIST, LMAX, LMIN, MODE, PACK, RES_R, RES_T

And for standard MR:

DIFF, FILE_S, NMR, NOSG, P2, PST, STICK, VPST, LOCK, NREF, NREFP

And for Self RF:

CHI, PST, SCALE, FILE_TSR

And for multi-copy search:

AXIS, DIFF, DIST, P2, ALL, STICK

And for search in ED:

DIFF, P2, NPTD

And for fitting two models:

NPTD

And for search in ED:

NCS, ANGLES, CENTRE

And for Pure Rigid Body Refinement:

DOM, NCS, ANGLES, CENTRE

To get started with MOLREP, you first have to answer three questions:

  1. Do you want to have FILE-DOCUMENT molrep.doc? < N | Y | A >

    _DOC:

    Default: <N>

    N do not produce DOC-file
    Y produce DOC-file with new contents
    A keep old contents and add new information, i.e. if a file molrep.doc already exists, and the answer to question 1 is "A", the program will add any new information to the end of this file

    The DOC-file contains the protocol of the running of the program. With the DOC-file, the program creates a command (batch) file: molrep.bat.

    Also you can use this keyword DOC to redirect output files:

        molrep.doc
        molrep.bat
        molrep.pdb
        align.pdb
        molrep_rf.ps
        molrep_mtz.cif
      

    to special directory ( _DOC Y>path or _DOC >path). Examples:

    
      _DOC  Y>/y/people/alexei/
        or
      _DOC   >/y/people/alexei/
    
    
  2. What is the name of the input file with Fobs (allowed formats: ASCII (CIF or PDB) or BLANC or MTZ or CCP4 map), i.e. FILE_F?

    Assign the name of a formatted file of structure factors or MTZ file or file of map, or, in the case of fitting two atomic models, leave this option out. When using an MTZ file, here MTZ keywords must be used.

  3. What is the name of the input file with the model coordinates or file of map (allowed formats: PDB, CIF or BLANC or CCP4 map), i.e. FILE_M?

    The unit cell and space group of this file must be the same as those for the file with Fobs (of the unknown structure). If they are not the same, the program will take these values from the Fobs file.

    Without this file the program computes a Self Rotation Function and plots RF(theta,phi,chi) for

    chi = 180, 90, 120, 60
    You can change the fourth value of chi (60) by keyword CHI
    You can change the scale of these plots of RF by keyword SCALE

General keywords

NP <np>

Default: <10>

<np> is the number of peaks from the rotation function to be used/checked (maximum: 50).

In special cases (e.g. for a dyad search), the use of keywords FUN (with option 'T') and FILE_T is closely linked to NP.

NPT <npt>

Default: <20>

<npt> is the number of peaks from the translation function to be used/checked (maximum: 50).

For use in dyad search, see NPT for dyad search.

NMON <nmon>

Default: <1>

<nmon> is the number of monomers. The program will try to create a full model, which will consist of NMON initial models plus model_2.

COMPL <compl>

Default: automatic choice

<compl> is the completeness of the model: from 0.1 to 1.0. It corresponds to Boff: from RESMAX*2 to RESMAX*6. If COMPL is used, keywords RES_R and RES_T are ignored.

For example: if you have a dimer in the asymmetric part of the unit cell, COMPL=0.5.

SIM <sim>

Default: automatic choice

Similarity of the model: from 0.1 to 1.0. It corresponds to Badd: from Boverall to -Boverall. SIM=1 means normalized F will be used. When no knowledge of similarity is available, the use of SIM=0.5 as a starting value is recommended. If SIM is used, the keyword BADD is ignored.

SIM (Badd)
controls high resolution data
COMPL (Boff)
controls low resolution data

The use of Boff and Badd means to change Fobs and Fmodel:

|F|_new = |F|_input *exp(-Badd*s2)*(1-exp(-Boff*s2)

FUN < A | R | T | S | B | D >

Default: <A>

R calculate only Rotation Function
T calculate only Translation Function, reading list of peaks of RF from file (molrep_rf.tab) or from TAB_file
A calculate both: RF and TF
S rotate and position the model and compute R-factor and Correlation Coefficient
B pure Rigid Body refinement
D find HA positions by MR solution

FILE_T <filename>

Default: <molrep_rf.tab>

Input or output TAB_file (see also molrep_rf.tab)

MODEL_2 <filename>

Default: no model_2

Input file with the second (fixed) model in correct position and orientation, in PDB or BLANC format. This model will be fixed during the search. When fitting two models to each other, the second model is the target model.

SURF < N | Y | A | O | 2 | C >

Default: <Y>

Perform model correction.

N do not perform any model correction. For FUN=S (just_rotate_and_position) program changes N to O
O only shift to the origin
A make the protein into a polyalanine model (i.e. remove from the model: water molecules, H atoms, atoms with alternative conformation (except the first), atoms with occupancy = 0), make all B = 20, and shift to the origin
Y remove various atoms from the model (water molecules, H atoms, atoms with alternative conformation (except the first), atoms atoms with occupancy = 0), shift to the origin, compute atomic accessible surface area and replace atomic B with B = 15.0 + SURFACE_AREA*10.0
2 set all B = 20 and shift to the origin
C as Y but new B only for Packing function (not change original B) and shift to the origin

RAD <rad>

Default: automatically calculated from the model, unless:

Cut-off radius for Patterson search or for electron density search.

RESMAX <resmax>

Default: <3>

High resolution limit.

PATH_SCR <path>

Default: <not used>

Path to directory for scratch files. For example: /y/people/alexei/

Keywords for special cases

PST < N | C | Y >

Default: <N>

How to deal with pseudo-translation.

N ignore pseudo-translation altogether
C check only, but do not use pseudo-translation If FUN=R and LIST=L, the program computes a list of Patterson peaks and writes these to 'molrep.doc'. It may be useful to detect pseudo-translation.
Y use pseudo-translation. For the Translation Function, the program will add to the model a copy of the model which is translated by the pseudo-translation vector.

VPST <vpst1,vpst2,vpst3>

Default: automatically from Patterson

Pseudo-translation vector (in fractional units), used when PST = Y.

MODE <F | S | M>

Default: <F>

F standard rotation and translation functions are used without rigid body refinement
S advanced rotation and translation functions and rigid body refinement are used
M standard rotation and translation functions are used. Rigid body refinement is possible. Rather slow then MODE=F, but correlation coefficient is calculated more correctly.

RES_R <res_r>

Default: automatic choice

Low resolution limit for Rotation Function. Instead of applying RES_R directly, the program uses all data and applies Boff=2*(RES_R)2.

RES_T <res_t>

Default: automatic choice

Low resolution limit for Translation Function. Instead of applying RES_T directly, the program uses all data and applies Boff=2*(RES_T)2.

BADD <badd>

Default: <0>

BOFF and BADD mean:

|F|_new = |F|_input *exp(-Badd*s2)*(1-exp(-Boff*s2)

ANISO < N | Y | C | S | K >

Default: <N>

N do not use anisotropic correction and/or scaling
Y use anisotropic correction and scaling
C use anisotropic correction of Fobs for RF only
S use anisotropic scaling for TF only
K use scaling without B-factor

PACK < Y | N >

Default: <Y>

Y use Packing Function with Translation Function
N do not use Packing Function with Translation Function

LMIN <lmin>

Default: <4>

Minimum L-index of spherical coefficients. The program does not use coefficients with L=0. Possible values are 2,4,6,... L = 2 means to use all coefficients up to Lmax.

LMAX <lmax>

Default: automatic choice

Maximum L-index of spherical coefficients. Possible values are 2,4,6,8,...,58,60.

PRF < N | Y | S | P >

Default: <N>

N standard RF and Phased Translation Function is calculated
Y SAPTF (Spherically averaged phased translation function), Phased Rotation Function (PRF) and Phased Translation Function will be used.
S SAPTF (Spherically averaged phased translation function), Usual Rotation Function (RF) for modified map and Phased Translation Function will be used.
P Search the model orientation in ED map by rotating model around the defined points in ED map. List of points must be in the file FILE_T2.

Program will use the phases of BLANC (by keyword PHASE) or from MTZ file or from EM map.

If keyword FUN=T, rather than computing the Rotation Function, the program reads rotation function results from file FILE_T ( or "molrep_rf.tab"): "Sol_ peak number, polar angles (theta,phi,chi) and shift (sx,sy,sz)"

PHASE < name >

Default: none

BLANC file of phases. If input Fobs file is CIF use 'PHASE +'. It means to use the phases from CIFile.

NOSG <nosg>

Default: <0>

Number of new space group if you want to change the space group for the file of structure factors. Program just changes space group name, group number and cryst. symmetry operators, but not cell and data.

LIST < S | L >

Default: <S>

S short DOC-file
L long DOC-file

DIFF < N | P | F | H >

Default: <N>

N use unmodified structure factors
P use modified structure factors instead of Fobs for RF, as follows: ||Fobs|-|Fmod2|*(P2/100)|
F use modified structure factors instead of Fobs for RF, as follows: vector difference (Fobs - Fmod2*(P2/100))
H for heavy atom search

P2 <p2>

Default: <0>

Percentage of model_2 in the structure.

NREF <ncycle>

Default: <10>

number of cycles of rigid body refinement for each TF solution.

see keyword:MODE

NREFP <ncycle>

Default: <0>

number of cycles of rigid body refinement before TF for each peak RF.

Default is without this refinement

STICK < N | Y >

Default: <N>

Choose from symmetry-related models closest to which found before (this option does not work with pseudo-translation possibility).

FILE_S <filename>

File with sequence for model correction by sequence alignment.

NMR < 0 | 1 | 2 | 3 >

Default: <0>

0 use PDB file with NMR structures as single model
1 use NMR possibility only for RF
2 use NMR possibility for RF and TF. Best NMR model will be found and used as solution.
3 use NMR possibility for RF and TF. Averaged TF will be used. All NMR models will be used as solution.

LOCK < Y | N >

Default: <N>

Locked Cross Rotation function will be performed. Use also keywords: FILE_TSR and NSRF

Keywords specific for multi-copy search

DYAD < N | Y | D >

Default: <N>

Y multi-copy search
D dyad search

DIST <Dmin,Dmax,Dpar>

Three distances for dyad search.

Dmin Default: radius of gyration. minimal distance between molecules
Dmax Default: 1000Å. maximal distance between molecules
Dpar Default: 1000Å. maximal shift along rotation axis

AXIS <Chi,Delta>

Default: <0,0>

Chi
check only rotation by Chi (in degrees). 0 means to check all orientations.
Delta
delta for Chi (in degrees)

NSRF <nsrf>

Default: <0>

Number of peaks of Self-RF which will be used. 0 means not to use Self-RF. A list of Self-RF peaks will be taken from file defined by keyword FILE_TSR which must be prepared in advance (see Self Rotation Function).

NPT <npt>

This meaning only in conjunction with keyword DYAD: number of peaks in the STF (Special Translation Function) to be checked through Translation Function calculations, for inter-molecular vector search. If keyword DYAD is not given, the standard meaning of keyword NPT is used.

NPTD <nptd>

Number of peaks in TF to be checked through Correlation Coefficient calculations, for dyad search.

NP2 <np2>

Number of peaks in RF for second searching model to be checked for dyad search.

FILE_M2 <filename>

file of second searching model

FILE_T2 <filename>

file with list of peaks of RF for second searching model

ALL < N | Y >

Default: <N>

if ALL = Y , program will use all Crystallographic Symmetry Operators

Keywords for Self Rotation Function

Without a file of the model, the program computes a Self Rotation Function.

CHI <chi>

Default: <60>

Angle chi of additional fourth section of RF(theta,phi,chi).

SCALE <scale>

Default: <6>

Maximum value of RF is SCALE * SIGMA(RF).

FILE_TSR <filename>

Default: <molrep_srf.tab>

Input or output TAB_file with peaks of Self_RF.

Keywords for EM or electron density model:

DSCALEM <scale>

Default: <1>

scale factor of correction of density cell

INVERM < N | Y >

Default: <N>

If Y , inverted phases will be used

ROLIM <limit>

Default: <not used>

minimal value of density which will be used

DRAD <radius>

Default: <0>

radius of the model (in A). If parameter DRAD is defined program will use the density only inside the sphere with radius = DRAD and with centre in vector ORIGIN.

ORIGIN <vector>

Default: <0,0,0>

center of the model in the cell (in fract.units)

Keywords for EM or electron density instead of Fobs:

DSCALE <scale>

Default: <1>

scale factor of correction of density cell

INVER < N | Y >

Default: <N>

If Y , inverted phases will be used

DLIM <limit>

Default: <not used>

minimal value of density which will be used

Keywords for pure Rigid Body Refinement:

DOM < N | Y | I | S | C >

Default: <N>

N RB refinement as single body.
Y Multi-domain refinement.
I Give only information about molecule-domain structure. Useful for RB refinement with constraints.
S Multi-domain refinement with constraints.
C only create complete model using NCS parameters. See How to define NCS

Keywords for NCS parameters:

See How to define NCS

NCS <0&ncs_id;

Default: <0>

NCS identifier or = "1" which means to use NCS parameters from file.

ANGLES <theta,phi,chi>

Default: <0,0,0>

Polar angles of NCS which define the standard system orientation in the cell.

CENTRE <vector>

Default: <0,0,0>

position of the NCS centre in the cell (in fract.units)

Keywords for Finding or Searching Heavy atoms:

FILE_DER <filename>

Input file of derivative or use labels for MTZ file.

COMMAND (BATCH) FILE

The best and easiest way to prepare a command file is to run MOLREP once by dialogue. If a molrep.doc file was assigned, the program creates a command (batch) file (molrep.bat) automatically.

See some command (batch) file examples.

MOLREP VERSION TO READ MTZ file

Keywords for reading MTZ file

The following keywords are necessary only for MTZ files.

Flabel of F or F(+)
SIGFlabel of sigma F or sigma F(+)
F(-)label of F(-)
SIGF(-)label of sigma F(-)
ILabel of Intensity of hkl
SIGIStandard deviation of the above
I(-)label of Intensity of -h -k -l
SIGI(-)Standard deviation of the above
PHLabel of phases
FOMLabel of figure of merit
FDLabel of F-derivative
SIGFDLabel of sigma F-derivative
DPLabel of !F(+)!-!F(-)!
SIGDPLabel of DP
ENDend of block of MTZ keywords

TESTING THE MOLREP PACKAGE

In directory "../molrep/doc/" there are two files:

  1. .../molrep/doc/test.bat - command (batch) file of the test
  2. .../molrep/doc/test.doc - protocol of this test

In directory "../molrep/data/" there are several files for test:

  1. .../molrep/data/test.cif - formatted CIFile of Structure Factors.
  2. .../molrep/data/test.mtz - MTZ file of Structure Factors (for SG computer).
  3. .../molrep/data/test_mod.pdb - PDB cordinate file of model
  4. .../molrep/data/test_mod2.pdb - PDB cordinate file of second model
  5. .../molrep/data/nmr.pdb - PDB cordinate file with NMR models

. . . . . . . .

The test structure contains two molecules in the asymmetric part of the unit cell. The starting model is the first molecule with only back-bone and CB atoms with random coordinate error rms=0.25. The molecule was rotated and shifted from the correct position. The second model is the solution for one molecule.

"test.bat" contains 12 tests:

  1. Example with input MTZ file of structure factors
    Fobs : test.mtz
    Model: test_mod.pdb
  2. Example with input CIF file of structure factors
    Fobs : test.cif
    Model: test_mod.pdb
  3. Example with input CIF file of structure factors and second model which is the solution for the first peak of the Rotation Function.
    Fobs   : test.cif
    Model  : test_mod.pdb
    Model_2: test_mod2.pdb is solution for first peak of Rotation Function.
    
  4. Example to use NMON keyword with input CIF file of structure factors. In this case MOLREP will find two molecules related by non-crystallographic symmetry.
    Fobs   : test.cif
    Model  : test_mod.pdb
  5. Example for Dyad search.
    Fobs : test.cif
    Model: test_mod.pdb
  6. Example for NMR model.
    Fobs : test.cif
    Model: nmr.pdb
  7. Example for model fitting.
    Model :  test_mod.pdb - searching model
    Model_2: 2rnt.pdb - target model
  8. Example for searching a model in elecron density map.
    Fobs    : mtz.mtz   - Phases for 1 derivative (FOM = 0.31)
    Model: model.pdb - searching model
  9. Example for finding HA position by MR solution.
    Fobs    : test_ha.mtz   - Fobs and derivative
    Model2  : model.pdb    - MR solution
    
  10. Example for HA search by multi-copy search
    Fobs    : test_ha.mtz   - Fobs and derivative
    
  11. Example HA for search by translation function
    Fobs    : test_ha.mtz   - Fobs and derivative
    
  12. Example for Self RF for Heavy Atom structure in derivative.
    Fobs    : test_ha.mtz   - Fobs and derivative
    

Test protocol

  1. copy alll files from /molrep/data/ and /molrep/doc/ to some directory
  2. check and, if necessary, adapt path to program "molrep" in the file "test.bat"
  3. run test: sh test.bat
  4. compare result with ".../molrep/doc/test.doc"

MOLECULAR REPLACEMENT METHOD - THEORY

Molecular replacement method (MR)

There are two major steps in the Molecular replacement method: orientation and translation search. They are performed by Rotation and Translation function. Both of them are correlation functions (or overlapping functions) between observed and calculated from model Patterson.

Rotation function (RF):

              ROT(R) = I Pobs(r) * Pcalc(R,r) dr
                      rad

where

R
operator of rotation
 I
rad
integral inside a sphere in the centre of patterson with radius=rad (i.e. the cut-off radius)
Pobs
observed Patterson
Pcalc
calculated Patterson for rotated (R) model

Translation function (TF):

            TR(s)  = I Pobs(r) * Pcalc(s,r) dr  =
                    cell

                   = Sum ( I Pobs(r) * Pcalc_ij(s,r) dr) = Sum TRij(s)  
                     i#j                                    i#j

where

s
vector of translation
I
integral
i,j
cryst. symmetry operator numbers
Pcalc_ij(s,r)
calculated Patterson for model corresponding to ith operator and model corresponding to jth operator
TRij(s)
translation function of Pattersons Pobs(r) and Pcalc_ij(s,r).
The Translation Function is the sum of translation functions for each pair of different cryst. symmetry operators.

The best rotation function algorithm is the Crowther Fast Rotation Function which we use here. It utilizes FFT. MOLREP can compute the Rotation Function for three different orientations of the model and average them. That reduces the noise of Rotation function.

Translation function algorithm was developed by the author and performs calculations in the reciprocal space using FFT.

There are two major differences from other translation functions.

  1. Instead of summation of the translation functions for two operators TRij, we use their multiplication which makes the resulting map far more contrast-rich.
  2. Finally we can multiply the translation function with the Packing Function to remove peaks corresponding to incorrect solutions with bad packing.
    Packing function (PF) is overlapping function:
        
                  P(s) = Sum ( I  Ro_i(r) * Ro_j(r) dr )
                         i#j  cell

    where Ro_i(r) is the electron density of the model which corresponds to the ith cryst. symmetry operator.

    The algorithm of calculation of the Packing Function is similar to the one for the Translation Function and performed by the same program.

    Finally the 'advanced' Translation function is:

                TR(s)  = [  M  TRij(s) ] * P(s)
                           i#j

    where M means multiplication of different TRij.

Scaling by Patterson

For scaling we use a completely new strategy based on the Patterson origin peak which is approximated by a Gaussian. This peak is computed for both the observed and calculated amplitudes, and each case the B_overall is computed. The difference

B_diff_overall = B_obs_overall - B_calc_overall

is then added to calculated B_overall so as to make the width of the calculated Patterson origin peak equal to the observed peak. This method makes it possible to have a good approximation for the scaling problem even if only low resolution data is available where other methods do not work. Scaling by Patterson is also useful for the Cross Rotation Function where we have different cells for the model and the unknown structure.

Low resolution cut-off (Boff)

Low resolution cut-off introduces systematic errors in the electron density especially near the surface of the model. This is known as the series termination effect. Instead of using the usual low resolution cut-off, MOLREP multiplies the modules of the structure factors by a special coefficient:

Fnew = Fold (1-exp(-Boff*s2)), where Boff= 2resmin2

Boff is called the "soft low resolution cut-off", which allows removal of structure factors in this resolution range without introducing the series termination effect.

The use of a priori knowledge of similarity and completeness of the model

For low similarity the high resolution reflections are weighted down. For this, MOLREP uses an additional overall factor Badd:

Fnew = Fold exp(-Badd*s2)

Value of similarity 'SIM' can be: from 0.1 to 1.0. It corresponds to Badd: from (B_limit-Boverall) to -Boverall, where B_limit + 80.

SIM=1 means normalized F will be used.

For low completeness, e.g. when there are several molecules in the a.u., the contribution of low resolution reflections is weighted down. To manage the completeness of the model, MOLREP uses a low resolution cut-off (Boff). Completeness of model 'COMPL' can be : from 0.2 to 1.0. It corresponds to Boff: from 400 to 1600.

Functions of electron density searching (SM)

We suggest a new approach to divide a phased six-dimensional search into three steps:

  1. A spherically averaged translation function is used to locate the position of a molecule or its fragment. It compares locally spherically averaged experimental electron density with that calculated from the model and tabulates highly probable positions accordingly.
  2. Then for each position a local phased rotation function is used to find the orientation of the molecule.
  3. The third step is the phased translation function, used to check and refine the found position.

Spherically averaged phased translation function (SAPTF)

SAPTF gives the expected position of a model in an electron density map by the comparison of spherically averaged density of the model with locally spherically averaged observed density.

SAPTF(s) = I Robs(r,s) * Rcalc(r) dr
         rad(s)

where

  I
rad(s)
integral inside a sphere centred in point s of electron density with radius=rad (i.e. the cut-off radius)
Robs
spherically averaged around point s observed electron density
Rcalc
spherically averaged around origin of coordinate system calculated electron density for model

Phased Rotation function (PRF)

PRF gives the orientation of model placed in some point of electron density.

PROT(O) = I Robs(r) * Rcalc(O,r) dr
        rad(s)

where

O
operator of rotation
  I
rad(s)
integral inside a sphere centred in point s of electron density with radius=rad
Robs
observed electron density
Rcalc
calculated electron density for rotated (O) model

Phased Translation function (PTF)

Translation search in electron density map.

PTR(s)  = I Robs(r) * Rcalc(s,r) dr
        cell

where

s
vector of translation
I
integral
Robs
observed electron density
Rcalc(s,r)
calculated electron density for model placed in the vector s

Fitting two models (FM)

Fitting through electron density. Second model (MODEL_2) is the target model which converted to electron density. To search the best overlapping of electron densities of models there are two algorithms:

  1. Rotation Function (Patterson) and Phased Translation Function (electron density).
  2. All functions for electron density. Spherically Averaged Phased Translation Function gives expected position for model. Phased Rotation Function for expected position gives orientation. Phased Translation Function checks and refines the translation vector.

Special Translation Function (STF) for dyad search

Multi-copy search

Search two copies of a model simultaneously. There are three stages to this:

  1. Rotation function. The program checks all pairs of first NP peaks of Rotation Function (RF). For each pair the program uses the first rotation to prepare model-1. Model-2 will be prepared by using the second rotation and one rotation from the crystallographic symmetry operators.
  2. Next, for the current pair (model-1 and model-2): MOLREP computes the Special Translation Function (STF) to find the inter-molecular vector of this dyad.
  3. For NPT peaks of the previous Special Translation Function (STF) (i.e. for NPT inter-molecular vectors) the program computes a standard Translation Function (TF) using the current dyad as model and calculates a Correlation Coefficient for first NPTD peaks of TF.

Special Translation Function (STF)

Imagine two models in the asymm. part of the unit cell:

F1(h)
structure factor of model_1 with the centre of gravity in the origin of the coord. system
F2(h)
structure factor of model_2

Let

S1
vector in unit cell from the origin of the coord. system to the centre of gravity of model_1
S2
vector for model_2

When F(h) is the total structure factor (for the whole crystal structure):

F(h) = F1(h)exp(-2pihS1) + F2(h)exp(-2pihS2)

Then the Patterson is:

P(h) = F(h)*F'(h)

       = F1(h)*F1'(h)
        + F1'(h)*F2(h)*exp(-2pih(S2-S1))
        + F2'(h)*F2(h)
        + F1(h)*F2'(h)*exp(-2pih(S1-S2))

       = P0(0) + P1(S2-S1) + P1(S1-S2)

The Special Translation Function is a Phased TF with a Patterson function as electron density and P1 = F1'(h)*F2(h) as structure factors of the model. Solution of this function is the dyad vector S1-S2.

Anisotropic correction and scaling

Aniso correction:

  For Structure Factors we can estimate:
           
     1.  isotropic B_overal:

           F(s) ~ Scale_overall * exp (-B_overall*s^2) 

     2.  anisotropic B_overall (tensor) : 

           F(s) ~ Scale_overall * exp(-(B11a*a*hh +2B12a*b*hk+..)

    
     Aniso correction means to make data isotropic with B_overall:


   F_new(s) = F_old(s) * exp(+(B11a*a*hh +2B12a*b*hk+..) * exp(-B_overall*s^2) 
        

Aniso scaling:

       Fnew = Scale*Fold*exp(-(B11a*a*hh +2B12a*b*hk+..)

       Scale ans aniso B are taken by mimimization: sum(!Fobs-Fnew!)

INPUT FILE EXAMPLES

A. Example of CIF file of amplitudes:

         data_structure_9ins
         _cell.length_a      100.000
         _cell_length_b      100.000
         _cell.length_c      100.000
         _cell.angle_alpha    90.000
         _cell.angle_beta     90.000
         _cell.angle_gamma    90.000
         _symmetry.space_group_name_H-M  'P 1 21 1'
         loop_
         _refln.index_h
         _refln.index_k
         _refln.index_l
         _refln.F_meas_au
         _refln.F_meas_sigma_au
            2  3   4    12.3   1.2
           -2 -3  -4    11.4   1.1
          . . . . . . . . . . . . .

For intensities use:

         _refln.intensity_meas 
         _refln.intensity_sigma 

B. Example of CIF file of amplitudes with phases:

         data_9ins
         _cell.length_a      100.000
         _cell_length_b      100.000
         _cell.length_c      100.000
         _cell.angle_alpha    90.000
         _cell.angle_beta     90.000
         _cell.angle_gamma    90.000
         _symmetry.space_group_name_H-M  'P 1 21 1'
         loop_
         _refln.index_h
         _refln.index_k
         _refln.index_l
         _refln.F_meas_au
         _refln.F_meas_au_sigma
         _refln.phase_calc
         _refln.fom
            1   0   0    3468.4934   138.7397    0.746  1.000 
            2   0   0     618.4012    24.7360   11.948  1.000 
          . . . . . . . . . . . . . . . . . . . . . . . . . . . 

Phases are in degrees.

C. Example of PDB file of amplitudes:

       HEADER   R2SARSF   15-JAN-91
       COMPND   RIBONUCLEASE SA (E.C.3.1.4.8) COMPLEX WITH 3'-*GUANYLIC ACID 
       SOURCE   (STREPTOMYCES $AUREOFACIENS)
       AUTHOR   J.SEVCIK,E.J.DODSON,G.G.DODSON
       CRYST1  64.900   78.320   38.790  90.00  90.00  90.00 P 21 21 21    8
       CONTNT   H,K,L,S,FOBS,SIGMA(FOBS)
       FORMAT   (2(I3,2I4,2F7.0,F6.0,9X))
       COORDS   2SAR
       REMARK  1 TWO REFLECTIONS PER RECORD.
       REMARK  2 DMIN=1.85, DMAX=16.28
       CHKSUM  1 MIN H=0,MAX H=34,MIN K=0,MAX K=41,MIN L=0,MAX L=20
       CHKSUM  2 TOTAL NUMBER OF REFLECTIONS=17346
       CHKSUM  3 TOTAL NUMBER OF REFLECTION RECORDS=8673
       CHKSUM  4 SUM OF FOBS=0.235499E+07
         0   0   3     60      9    16           0   0   4    106    307    25
         0   0   5    166     23    20           0   0   6    239    657    52
         0   0   7    326      0    38           0   0   8    425    511    40
       . . . . . . . . . . . . . . . . . . . . . .

D. Example of simple formatted file of amplitudes:

In this case the assumption is that order of data is H,K,L,F,sig(F)

                   
            2  3   4    12.3   1.2
           -2 -3  -4    11.4   1.1
           . . . . . . . . . . . . .

               or 

            2  3   4    12.3  
           -2 -3  -4    11.4  
           . . . . . . . . . 

The length of file records must not exceed 80 characters. The format of the records is free, e.g. data must be separated by blancs (be careful - some PDB files do not satisfy this rule).

COMMAND (BATCH) FILE EXAMPLES

BATCH file example of Cross Rotation and Translation functions:

# --------------------------------
 molrep <<stop
# --------------------------------
#
 _DOC  Y
#
_FILE_F fobs.dat
_FILE_M mm1.crd
#
#
_NP   8
_RAD 27
_ANISO C
_sim   .1
_compl .5
_END
stop

BATCH file example of Self Rotation function:


# --------------------------------
molrep <<stop
# --------------------------------
#
_DOC  Y
#
_FILE_F fobs.dat
_FILE_M 
#
#
_RAD 27
_END
stop

BATCH example with MTZ file:

# --------------------------------
molrep <<stop
# --------------------------------
#
_DOC  Y
#
_FILE_F p1.mtz
#
_F  FO
_SIGF  SDFO
_END     <--- end of MTZ block
#
_FILE_M p1_pdb.cds
#
#
_NP   8
_ANISO C
_sim   .1
_compl .5
_END
stop

BATCH example with MTZ file

For searching in the electron density map for some model (standard Rotation Function will be used):

# --------------------------------
molrep <<stop
# --------------------------------
#
_DOC  Y
#
_FILE_F p1.mtz
#
_F  FO
_SIGF  SDFO
_PH    PH_FO
_END   <--- end of MTZ block
#
_FILE_M mod.pdb
#
#
_NP    8
_END
stop

BATCH file example of fitting two models:

# --------------------------------
molrep <<stop
# --------------------------------
#
_DOC  Y
#
_FILE_F 
_FILE_M mod1.pdb
#
_MODEL_2 mod2.pdb
_PRF Y
_END
stop

BATCH file example of dyad search:

# --------------------------------
molrep <<stop
# --------------------------------
#
_DOC  Y
#
_FILE_F fobs.dat
_FILE_M mod1.pdb
#
_dyad y
_axis 0,10
_dist 0,300,300
_NPT  3
_NPTD 3
_END
stop

BATCH file example of dimer search:

# --------------------------------
molrep <<stop
# --------------------------------
#
_DOC  Y
#
_FILE_F fobs.dat
_FILE_M mod1.pdb
#
_dyad y
_axis 180,10
_dist 0,300,1
_NPT  3
_NPTD 3
_END
stop

BATCH file example dimer search for Self-RF orientations:

# --------------------------------
molrep <<stop
# --------------------------------
#
_DOC  Y
#
_FILE_F fobs.dat
_FILE_M mod1.pdb
#
_dyad y
_axis 180,10
_dist 0,300,1
_NSRF 20
_NPT  3
_NPTD 3
_END
stop

BATCH file example of using file of sequence

# --------------------------------
molrep <<stop
# --------------------------------
#
_DOC  Y
#
_FILE_F mtz.mtz
#
_F  FP
_SIGF  SIGFP
_END     <--- end of MTZ block
#
_FILE_M 1hpg.pdb
#
#
_NP   8
_NMON 2
_FILE_S new.seq
_sim   .1
_compl .5
_END
stop

How to define NCS

Program supports the point group symmetry.
NCS_ID Point group description.
N00 Point group is N. For example Point group is 7, NCS_ID is 700.
Standard orientation: Nfold axis along Z.
N20 Point group is N2. For example Point group is 72, NCS_ID is 720.
Standard orientation: Nfold axis along Z, twofold axis along X.
N22 Point group is N22. For example Point group is 422, NCS_ID is 422.
Standard orientation: Nfold axis along Z, twofold axis along X.
230 Point group is 23.
Standard orientation: twofold axis along Z, another twofold axis along X.
432 Point group is 432.
Standard orientation: fourfold axis along Z, another fourfold axis along X.
532 Point group is 532.
Standard orientation: fivefold axis along Z, projection closest (to Z axis) threefold axis in plan XY along X.

Polar angles theta, phi, chi define the standard system orientation in the cell. Theta, phi - polar coordinates of Z standard axis. Chi - angle of rotation around theta-phi-axis (Z standard axis) which bring X axis to standard X axis.

cx,cy,cz (fract.units) define the position of group centre in the cell.

It is possible to define NCS parameters using keywords or in input PDB file.

Definition by keywords

Input PDB file must contain only one molecule.Use keywords:

NCS, ANGLES, CENTRE

NCS - NCS_ID
ANGLES - theta, phi, chi
CENTRE - cx,cy,cz

Definition in input PDB file

First (reference) molecule must be started with line (free format):

#MOLECULE NCS_ID theta phi chi cx cy cz

Other molecules must be started with line:

#MOLECULE Nmol theta phi chi

where:

Nmol - molecule number.

theta phi chi - Polar angles of rotation from first molecule to current one.

For example: point group is 3.

HEADER    HYDROLASE (ENDORIBONUCLEASE)         
CRYST1   64.900   78.320   38.790  90.00  90.00 ...
#MOLECULE  300    0   0  0  .5  .5  .5
#DOMAIN     1 
ATOM      1  N   ASP A   1      45.161  12.836 ... 
ATOM      2  CA  ASP A   1      45.220  12.435 ...   
 ... 
ATOM    745  SG  CYS A  96      58.398   6.673 ... 
ATOM    746  O   CYS A  96      62.238   7.178 ...  
#DOMAIN     2 
ATOM    747  N   PHE A  97      44.487  11.386 ...  
ATOM    748  CA  PHE A  97      44.559  11.129 ... 
 ...
ATOM    945  C   VAL A 196      58.398   6.673 ... 
ATOM    946  O   VAL A 196      62.238   7.178 ...  
#DOMAIN     1 
ATOM    947  N   ASP A 197      44.487  11.386 ...  
ATOM    948  CA  ASP A 197      44.559  11.129 ... 
 ...
#MOLECULE  2    0   0  120  
#DOMAIN     1 
ATOM      1  N   ASP A   1      45.161  12.836 ... 
ATOM      2  CA  ASP A   1      45.220  12.435 ...   
 ... 
ATOM    745  SG  CYS A  96      58.398   6.673 ... 
ATOM    746  O   CYS A  96      62.238   7.178 ...  
#DOMAIN     2 
ATOM    747  N   PHE A  97      44.487  11.386 ...  
ATOM    748  CA  PHE A  97      44.559  11.129 ... 
 ...
ATOM    945  C   VAL A 196      58.398   6.673 ... 
ATOM    946  O   VAL A 196      62.238   7.178 ...  
#DOMAIN     1 
ATOM    947  N   ASP A 197      44.487  11.386 ...  
ATOM    948  CA  ASP A 197      44.559  11.129 ... 
 ...
#MOLECULE  3    0   0  240  
#DOMAIN     1 
ATOM      1  N   ASP A   1      45.161  12.836 ... 
ATOM      2  CA  ASP A   1      45.220  12.435 ...   
 ... 
ATOM    745  SG  CYS A  96      58.398   6.673 ... 
ATOM    746  O   CYS A  96      62.238   7.178 ...  
#DOMAIN     2 
ATOM    747  N   PHE A  97      44.487  11.386 ...  
ATOM    748  CA  PHE A  97      44.559  11.129 ... 
 ...
ATOM    945  C   VAL A 196      58.398   6.673 ... 
ATOM    946  O   VAL A 196      62.238   7.178 ...  
#DOMAIN     1 
ATOM    947  N   ASP A 197      44.487  11.386 ...  
ATOM    948  CA  ASP A 197      44.559  11.129 ... 
 ...

Alternative way is to use only first molrecule (with NCS parameters in the file) and generate complete model automaticly. In pure RB refinement use keyword DOM = 'C'. For fitting model into map (i.e. SAPTF+PRF+PTF use keyword NCS = 1).

How to redirect output and scratch files

You can use keyword PATH_SCR to redirect all scratch files to special directory.

You can use keyword DOC to redirect output files:

  molrep.doc
  molrep.bat
  molrep.pdb
  align.pdb
  molrep_rf.ps
  molrep_mtz.cif

to special directory.Examples:


_DOC  Y>path
 or
_DOC   >path

Convention for rotation


  Rotation by Eulerian angles Alpha, Beta, Gamma:

    eulerian angles : 1. A( Z ) - alpha around axis Z
                      2. B( Y') - beta  around new axis Y
                      3. G( Z') - gamma around new axis Z

  Rotation by Polar angles Theta, Phi, Chi:

                    polar coordinates Theta, Phi of rotate axis:
 
       Theta     -  angle between  rotate axis and Z
       Phi       -  angle in plane XY between X and projection rotate axis

       Chi       -  rotation angle around rotate axis

Convention for Orthonormal coordinate system

       Orthonormal axes are defined to have:
 
       A parallel to X , Cstar parallel to Z

MEMORY CONTROL PARAMETERS

In main_molrep_mtz.f:

 
CC --- MEMORY - common memory for maps and coordinates
       PARAMETER ( MEMORY  =4000000 )
CC --- NCRDMAX - maximal number of coordinates
       PARAMETER ( NCRDMAX = 100000  )
C ----

If program stops with message:

               ERROR: not memory enough ...

change parameter MEMORY in main_molrep_mtz.f