Guided MD tutorial
Alexander Björling, alebjo(at)chem.gu.se, March 2015
This tutorial explains how to set up and run MD simulations guided by solution X-Ray scattering data. The goal of the tutorial is to show the steps involved in setting up these calculations, so that the reader should be able to set up a new system with experimental data. It does not assume knowledge of GROMACS or MD, as the initial setup will be covered.
We will use the artificial example of the Lysine-Arginine-Ornithine-binding (LAO) protein, which is described in the orignal paper (Björling et al., 2015). This protein slightly changes the relative orientation of two subdomains when a ligand is bound or released (Hayward et al., 1999). The test described here is a simulation which starts from the crystal structure of the bound holo form (albeit with the ligand removed), and aims at the unbound apo form, based only on theoretical scattering patterns.
Install the software
Hopefully, the guided MD implementation will be a part of standard GROMACS 5 soon. Until then, the software can be installed from a development repository.
First, get a copy of the GROMACS repository, standing in directory where you want the code.
git clone https://gerrit.gromacs.org/gromacs.git
Then, fetch and check out the WAXS code. Go to the Gerrit page and find the latest patch number. In the below command, it's 106, but do replace it by the proper value.
git fetch https://gerrit.gromacs.org/gromacs refs/changes/59/2659/106 && git checkout FETCH_HEAD
The code now needs to be compiled, which I usually do as follows. Create a
build and an
make. The options will depend on your machine and preferences, but this is what works for me. The guided MD code doesn't support MPI parallelization, so I disable it. It will require things like FFTW, just like a normal GROMACS installation. The
-j option causes the code to compile in parallel, depending on your machine.
cmake ../gromacs -DGMX_MPI=OFF -DGMX_THREAD_MPI=OFF -DGMX_OPENMP=ON -DCMAKE_INSTALL_PREFIX=../install
make install -j 6
To use the newly installed GROMACS version,
source the file
GMXRC, which is located under the
Prepare the system
We first need to get the crystal structure model into a form that GROMACS can work with. This consists of a number of steps, which include adding hydrogen atoms, specifying all interactions in the molecule, choosing the potential and water model, energy-minimizing and equilibrating temperature and pressure. I will not explain each step in detail. For more information on the GROMACS tools, consult these tutorials, read the GROMACS manual, or run the help command for each tool, for example
gmx editconf -h.
The crystal model (code 1LST) is available from the PDB repository. However, as with many other structures, all atoms were not resolved experimentally, and the
pdb file lacks these. There are web utilities to fill in missing atoms, and I've used the WHAT IF server. Download the completed files for the holo and apo forms,
Since the lysine ligand would get in the way of structural change, manually remove it from the end of the
pdb file (get rid of a
TER line as well), and name the new file
Make a topology
Now, GROMACS has a utility for setting adding hydrogens and setting up a topology (which is the GROMACS term for "all system information except coordinates, including identities, charges, bonds, and all other interactions"). Run the
gmx pdb2gmx -f 1LST_ed_whatif_noligand.pdb -o 1LST_ed_whatif_noligand.gro
pdb2gmx program will ask for a force field and a water model, I've used the CHARMM27 with the TIP3P water model, but others should work too. Study the output, and make sure everything seems to have worked. For example,
pdb2gmx should detect a sulfur bridge between residues 38 and 45 as specified in the original PDB entry, the termini should be correctly chosen and the total charge should be -3.
Set up the simulation box
editconf command to set the shape and size of the simulation box. Since Debye scattering terms are encoded as bonded interactions, and since we will be using periodic boundary conditions here, we'll need a big box, bigger than twice the longest inter-protein distance. For this system, 13.5 nm seems a good value.
gmx editconf -f 1LST_ed_whatif_noligand.gro -o 00box -box 13.5 -bt cubic
Next, we will let the protein relax in the potential. Download the run options file
01min.mdp, then issue the preprocessor (
grompp) and simulation (
gmx grompp -f 01min -c 00box -o 01min
gmx mdrun -deffnm 01min -v
Next, we'll solvate the protein, first flooding the simulation box with water, then replacing a number of water molecules by ions. We need three extra positive charges to neutralize the protein, then around 31 pairs to reach roughly 50 mM in this particular box.
gmx solvate -cp 01min -cs spc216 -o 02solv -p topol
gmx grompp -f 01min -c 02solv -o 03ions
gmx genion -s 03ions -o 03ions -p topol -np 34 -nn 31
The next step is to let the system relax again, to relieve any clashes that may have arisen above.
gmx grompp -f 01min -c 03ions -o 04min
gmx mdrun -v -deffnm 04min
It's time to equilibrate the system, first in the microcanonical (NVT), then in the isothermal-isobaric (NPT) ensemble. Download the run options files (
06npt.mdp) and take a look at them. Start the equilibration with the following commands.
gmx grompp -f 05nvt -c 04min -o 05nvt
gmx mdrun -v -deffnm 05nvt
gmx grompp -f 06npt -c 05nvt -t 05nvt -o 06npt
gmx mdrun -v -deffnm 06npt
Both these equilibration runs were 20 ps, which might be on the short side. Use the tool
gmx energy to make sure that temperature and pressure has converged, otherwise extend the runs.
Prepare the scattering input
In the GROMACS implementation, each term of the Debye equation is encoded as a bonded interaction in the topology. Each type of scatterer (each scattering factor) is in turn defined in
xml files shipped with the code. There are scatterers defined for atoms, MARTINI coarse beads and entire amino acid residues. The program
gmx genrestr helps with constructing the scattering topology. The following command creates a half-matrix of Debye terms for the input
gro file, up to and including residue 238 (the last protein residue), uses amino acid scatterers, and writes the result to
waxs.itp. Amino acid scatterers are actually virtual atoms, so an extended coordinate (
gro) file is created as well.
gmx genrestr -matrix residue -resmax 238 -d $GMXDATA/top/sfactor_amino_acid_ds_Fourier.xml -o waxs.itp -f 06npt.gro -oc 07terms.gro
Manually include the Debye terms from
waxs.itp in the protein topology (
topol.top) by adding the line
#include "waxs.itp" directly after the
[ atoms ] section.
We are now ready to generate the initial scattering, which we need since we will be refining against difference data. This requires defining the set of q points to use. Download
waxs_zeros.dat, which contains such a set of q values. Obviously, the range and density of points depends on the system and the data available. Preprocess the system, then run the
gmx waxsdebye tool as follows.
gmx grompp -f 01min.mdp -c 07terms.gro -o 08initial
gmx waxsdebye -s 08initial.tpr -waxs_ref waxs_zeros.dat -sfac $GMXDATA/top/sfactor_amino_acid_ds_Fourier.xml -waxs_out waxs_initial
The output file
waxs_initial.xvg can be used as input if it is cleaned up a little. Remove the lines starting with
&, remove the third column of numbers, and rename the file
The last thing needed is the target difference scattering data. Obviously, this would normally be experimental data. In this example I've used theoretical data generated with the program CRYSOL, and in the paper (Björling et al., 2015) I used data generated by
gmx waxsdebye from the apo crystal structure. Download the difference data (
waxs_diff.dat), which is of course on the same q grid as the initial data.
Run the refinement and evaluate the fit
We're ready to run the guided MD simulation. Download the run options file (
09run.mdp) and the theoretical difference scattering curve (
waxs_diff.dat). The simulation is demanding, and it's probably a good idea to run it on a dedicated machine. The code doesn't benefit from MPI parallelization, but running on all cores of a single compute node over OpenMP is usually enough.
gmx grompp -f 09run.mdp -c 07terms.gro -o 09run
gmx mdrun -v -deffnm 09run -sfac $GMXDATA/top/sfactor_amino_acid_ds_Fourier.xml -waxs_ref waxs_initial.dat -waxs_diff waxs_diff.dat -waxs_out waxs_out -waxs_alpha waxs_alpha
We now want to compare the simulation to the original structure (1LST) and the target structure (2LAO). To avoid any confusion about atom identities (
pdb2gmx adds hydrogens and termini), we will create two
pdb files with just the C-alpha atoms of residues 1-238. Choose C-alpha when prompted.
gmx trjconv -f 2LAO_ed_whatif.pdb -s 2LAO_ed_whatif.pdb -o 2LAO_CA.gro
gmx trjconv -f 1LST_ed_whatif_noligand.pdb -s 1LST_ed_whatif_noligand.pdb -o 1LST_CA.gro
We'll do the same for the trajectory
xtc file, and at the same time center the protein so that it doesn't cross the cell boundaries, which would scew up the RMSD analysis. Again, choose C-alpha when asked.
gmx trjconv -s 09run.tpr -f 09run.xtc -o 09run_CA.xtc -pbc mol -center
Next, do the RMSD analysis, then view the results with
xmgrace or some other plotting tool. If you also want to see the WAXS energy decreasing, use the
gmx energy tool.
gmx rms -s 1LST_CA.gro -f 09run_CA.xtc -o rmsd_vs_holo
gmx rms -s 2LAO_CA.gro -f 09run_CA.xtc -o rmsd_vs_apo
gmx energy -f 09run.edr
This is my output. It compares well with the figure in the original publication (Björling et al., 2015) although the run is shorter. The trajectory approaches the apo structure (black curve) while moving further away from the holo structure (red curve). The WAXS energy follows the target RMSD quite closely.