Protein Preparation and Modeling

Protein Preparation and Modeling

The quality of results from any biophysical modeling project depends strongly on the ability to first prepare a system from an experimental data file, such as PDB or mmCIF. Unfortunately, these experiments often cannot resolve key pieces of the biological system. The missing details could be as innocent as the location of the protons, or as extreme as the missing entire mobile portions of a protein.

SPRUCE streamlines the preparation process by automatically breaking the system into individual biological components, adding any missing protons or residues, and it finishes by optimizing the hydrogen bond network for the entire system.

SPRUCE’s structure preparation workflow performs tasks including the enumeration of biological units, alternate locations (if present), modeling missing residues and loops, and placing and optimizing hydrogens, accounting for the likely tautomer states of bound heterogens (ligands and cofactors).

The produced output of SPRUCE is an OEDesignUnit. The OEDesignUnit has everything well componentized, making it easy to select which key pieces to include in the subsequent modeling tasks, and which to discard (e.g. excipients).

SPRUCE furthermore leverages the Iridium categorization [1] by providing the user with information about which structure is best to use for modeling. Additionally, SPRUCE highlights the parts of a structure that need special attention, if it is to be used in modeling.

SPRUCE provides an expanding array of modeling tasks, like point-mutations, side-chain re-modeling and loop modeling using a template-based approach.

SPRUCE, furthermore, provides access to several superposition methods, based on sequence, secondary structures, or active site shape, each providing benefits depending on the similarity of the proteins being superposed.

Structure-based drug design requires careful preparation of experimental structures for downstream modeling applications. SPRUCE is a comprehensive, biomodeling preparation tool that reads experimentally solved (or modeled) protein and/or nucleic acid structures in several file formats and makes them modeling ready, for docking or molecule simulations.

To find out how SPRUCE can help with your protein modeling projects, contact us at info@eyesopen.com


Prepared OEDesignUnit from PDB code 3TPP. Modeled sidechains and loops in brown. Loop 1: DVA between GLU 310 A and THR 314 A, Loop 2: GFPLNQSEVLAS between ALA 157 A and VAL 170 A


Loop modeling performance based on dataset from Rossi et al. [2].


  1. Essential considerations for using protein-ligand structures in drug discovery G.L. Warren, T. D. Do, B. P. Kelly, A. Nicholls, S. D. Warren, Drug Discov. Today, 2012, 17, 1270-81

  2. Loopholes and missing links in protein modeling A. Rossi, C. A. Weiglet, A. Nayeem, S. R. Krystek Jr., Prot. Sci., 2007, 1999-2012