Guided MD tutorial

Alexander Björling, alebjo(at)chem.gu.se, March 2015

This tutorial explains how to set up and run MD simulations guided by solution X-Ray scattering data. The goal of the tutorial is to show the steps involved in setting up these calculations, so that the reader should be able to set up a new system with experimental data. It does not assume knowledge of GROMACS or MD, as the initial setup will be covered.

We will use the artificial example of the Lysine-Arginine-Ornithine-binding (LAO) protein, which is described in the orignal paper (Björling et al., 2015). This protein slightly changes the relative orientation of two subdomains when a ligand is bound or released (Hayward et al., 1999). The test described here is a simulation which starts from the crystal structure of the bound holo form (albeit with the ligand removed), and aims at the unbound apo form, based only on theoretical scattering patterns.

Install the software

Hopefully, the guided MD implementation will be a part of standard GROMACS 5 soon. Until then, the software can be installed from a development repository.

First, get a copy of the GROMACS repository, standing in directory where you want the code.

   git clone https://gerrit.gromacs.org/gromacs.git

Then, fetch and check out the WAXS code. Go to the Gerrit page and find the latest patch number. In the below command, it's 106, but do replace it by the proper value.

   cd gromacs
   git fetch https://gerrit.gromacs.org/gromacs refs/changes/59/2659/106 && git checkout FETCH_HEAD

The code now needs to be compiled, which I usually do as follows. Create a build and an install directory.

   cd ..
   mkdir build
   mkdir install

Now run cmake and make. The options will depend on your machine and preferences, but this is what works for me. The guided MD code doesn't support MPI parallelization, so I disable it. It will require things like FFTW, just like a normal GROMACS installation. The -j option causes the code to compile in parallel, depending on your machine.

   cd build
   cmake ../gromacs -DGMX_MPI=OFF -DGMX_THREAD_MPI=OFF -DGMX_OPENMP=ON -DCMAKE_INSTALL_PREFIX=../install
   make install -j 6

To use the newly installed GROMACS version, source the file GMXRC, which is located under the install directory.

   source ../install/bin/GMXRC

Prepare the system

We first need to get the crystal structure model into a form that GROMACS can work with. This consists of a number of steps, which include adding hydrogen atoms, specifying all interactions in the molecule, choosing the potential and water model, energy-minimizing and equilibrating temperature and pressure. I will not explain each step in detail. For more information on the GROMACS tools, consult these tutorials, read the GROMACS manual, or run the help command for each tool, for example gmx editconf -h.

The crystal model (code 1LST) is available from the PDB repository. However, as with many other structures, all atoms were not resolved experimentally, and the pdb file lacks these. There are web utilities to fill in missing atoms, and I've used the WHAT IF server. Download the completed files for the holo and apo forms, 1LST_ed_whatif.pdb and 2LAO_ed_whatif.pdb.

Since the lysine ligand would get in the way of structural change, manually remove it from the end of the pdb file (get rid of a TER line as well), and name the new file 1LST_ed_whatif_noligand.pdb.

Make a topology

Now, GROMACS has a utility for setting adding hydrogens and setting up a topology (which is the GROMACS term for "all system information except coordinates, including identities, charges, bonds, and all other interactions"). Run the pdb2gmx command.

   gmx pdb2gmx -f 1LST_ed_whatif_noligand.pdb -o 1LST_ed_whatif_noligand.gro

The pdb2gmx program will ask for a force field and a water model, I've used the CHARMM27 with the TIP3P water model, but others should work too. Study the output, and make sure everything seems to have worked. For example, pdb2gmx should detect a sulfur bridge between residues 38 and 45 as specified in the original PDB entry, the termini should be correctly chosen and the total charge should be -3.

Set up the simulation box

Issue the editconf command to set the shape and size of the simulation box. Since Debye scattering terms are encoded as bonded interactions, and since we will be using periodic boundary conditions here, we'll need a big box, bigger than twice the longest inter-protein distance. For this system, 13.5 nm seems a good value.

   gmx editconf -f 1LST_ed_whatif_noligand.gro -o 00box -box 13.5 -bt cubic

Next, we will let the protein relax in the potential. Download the run options file 01min.mdp, then issue the preprocessor (grompp) and simulation (mdrun) commands.

   gmx grompp -f 01min -c 00box -o 01min
   gmx mdrun -deffnm 01min -v

Next, we'll solvate the protein, first flooding the simulation box with water, then replacing a number of water molecules by ions. We need three extra positive charges to neutralize the protein, then around 31 pairs to reach roughly 50 mM in this particular box.

   gmx solvate -cp 01min -cs spc216 -o 02solv -p topol
   gmx grompp -f 01min -c 02solv -o 03ions
   gmx genion -s 03ions -o 03ions -p topol -np 34 -nn 31

Equilibrate

The next step is to let the system relax again, to relieve any clashes that may have arisen above.

   gmx grompp -f 01min -c 03ions -o 04min
   gmx mdrun -v -deffnm 04min

It's time to equilibrate the system, first in the microcanonical (NVT), then in the isothermal-isobaric (NPT) ensemble. Download the run options files (05nvt.mdp and 06npt.mdp) and take a look at them. Start the equilibration with the following commands.

   gmx grompp -f 05nvt -c 04min -o 05nvt
   gmx mdrun -v -deffnm 05nvt

Then,

   gmx grompp -f 06npt -c 05nvt -t 05nvt -o 06npt
   gmx mdrun -v -deffnm 06npt

Both these equilibration runs were 20 ps, which might be on the short side. Use the tool gmx energy to make sure that temperature and pressure has converged, otherwise extend the runs.

Prepare the scattering input

In the GROMACS implementation, each term of the Debye equation is encoded as a bonded interaction in the topology. Each type of scatterer (each scattering factor) is in turn defined in xml files shipped with the code. There are scatterers defined for atoms, MARTINI coarse beads and entire amino acid residues. The program gmx genrestr helps with constructing the scattering topology. The following command creates a half-matrix of Debye terms for the input gro file, up to and including residue 238 (the last protein residue), uses amino acid scatterers, and writes the result to waxs.itp. Amino acid scatterers are actually virtual atoms, so an extended coordinate (gro) file is created as well.

   gmx genrestr -matrix residue -resmax 238 -d $GMXDATA/top/sfactor_amino_acid_ds_Fourier.xml -o waxs.itp -f 06npt.gro -oc 07terms.gro

Manually include the Debye terms from waxs.itp in the protein topology (topol.top) by adding the line #include "waxs.itp" directly after the [ atoms ] section.

We are now ready to generate the initial scattering, which we need since we will be refining against difference data. This requires defining the set of q points to use. Download waxs_zeros.dat, which contains such a set of q values. Obviously, the range and density of points depends on the system and the data available. Preprocess the system, then run the gmx waxsdebye tool as follows.

   gmx grompp -f 01min.mdp -c 07terms.gro -o 08initial
   gmx waxsdebye -s 08initial.tpr -waxs_ref waxs_zeros.dat -sfac $GMXDATA/top/sfactor_amino_acid_ds_Fourier.xml -waxs_out waxs_initial

The output file waxs_initial.xvg can be used as input if it is cleaned up a little. Remove the lines starting with @ and &, remove the third column of numbers, and rename the file waxs_initial.dat.

The last thing needed is the target difference scattering data. Obviously, this would normally be experimental data. In this example I've used theoretical data generated with the program CRYSOL, and in the paper (Björling et al., 2015) I used data generated by gmx waxsdebye from the apo crystal structure. Download the difference data (waxs_diff.dat), which is of course on the same q grid as the initial data.

Run the refinement and evaluate the fit

We're ready to run the guided MD simulation. Download the run options file (09run.mdp) and the theoretical difference scattering curve (waxs_diff.dat). The simulation is demanding, and it's probably a good idea to run it on a dedicated machine. The code doesn't benefit from MPI parallelization, but running on all cores of a single compute node over OpenMP is usually enough.

   gmx grompp -f 09run.mdp -c 07terms.gro -o 09run
   gmx mdrun -v -deffnm 09run -sfac $GMXDATA/top/sfactor_amino_acid_ds_Fourier.xml -waxs_ref waxs_initial.dat -waxs_diff waxs_diff.dat -waxs_out waxs_out -waxs_alpha waxs_alpha

We now want to compare the simulation to the original structure (1LST) and the target structure (2LAO). To avoid any confusion about atom identities (pdb2gmx adds hydrogens and termini), we will create two pdb files with just the C-alpha atoms of residues 1-238. Choose C-alpha when prompted.

   gmx trjconv -f 2LAO_ed_whatif.pdb -s 2LAO_ed_whatif.pdb -o 2LAO_CA.gro
   gmx trjconv -f 1LST_ed_whatif_noligand.pdb -s 1LST_ed_whatif_noligand.pdb -o 1LST_CA.gro

We'll do the same for the trajectory xtc file, and at the same time center the protein so that it doesn't cross the cell boundaries, which would scew up the RMSD analysis. Again, choose C-alpha when asked.

   gmx trjconv -s 09run.tpr -f 09run.xtc -o 09run_CA.xtc -pbc mol -center

Next, do the RMSD analysis, then view the results with xmgrace or some other plotting tool. If you also want to see the WAXS energy decreasing, use the gmx energy tool.

   gmx rms -s 1LST_CA.gro -f 09run_CA.xtc -o rmsd_vs_holo
   gmx rms -s 2LAO_CA.gro -f 09run_CA.xtc -o rmsd_vs_apo
   gmx energy -f 09run.edr

This is my output. It compares well with the figure in the original publication (Björling et al., 2015) although the run is shorter. The trajectory approaches the apo structure (black curve) while moving further away from the holo structure (red curve). The WAXS energy follows the target RMSD quite closely.