The aim here was getting simulations of the DUDE receptor proteins up and running. The main challenge is a large number of missing sidechains and loops in these structures - this is not a problem for docking codes perhaps, but certainly is for MD simulation codes.

Modeller from the Sali lab seemed like the tool for the job, so let's grab that

sudo env KEY_MODELLER=MODELIRANJE dpkg -i modeller_9.16-1_amd64.deb  

The only useful reference I was able to find on filling loops and missing sidechains was:

However, it assumes the aligment.ali file is constructed by hand. Maybe that's pleasurable to some but when I have ~20 receptor structures to complete, there is no time for this. Let's automate how alignment.ali file is generated:

# file
from MDAnalysis import Universe
from MDAnalysis.lib.util import convert_aa_code
import textwrap, glob

u = Universe("receptor.pdb")
ca = u.select_atoms("protein and name CA")
residues = [(at.resid, for at in ca]
sequence = []
for i, (resid, resname) in enumerate(residues):
    if i:
        prev_resid = residues[i-1][0]
        gaps = ["-"]*(resid - prev_resid - 1)
seq_gap = "".join(sequence)

f = glob.glob("*.fasta.txt")[0]
fasta = "".join([l.strip()  for l in open(f).readlines() if not l.startswith(">")])
seq_filled = fasta[residues[0][0]+2: residues[-1][0]+residues[0][0]+1]

print """
structureX:receptor:   {0} : :+{1} : :::-1.00:-1.00
""".format(residues[0][0], len(residues), "\n".join(textwrap.wrap(seq_gap, 73)), "\n".join(textwrap.wrap(seq_filled, 73)))

from modeller import *
from modeller.automodel import *    # Load the automodel class

env = environ()

# directories for input atom files = ['.', '../atom_files']

a = loopmodel(env, alnfile = 'alignment.ali',
              knowns = 'receptor', sequence = 'receptor_fill')
a.starting_model= 1
a.ending_model  = 1

a.loop.starting_model = 1
a.loop.ending_model   = 2
a.loop.md_level       =

Lastly, that's how it all fits together

python > alignment.ali