The Why of ZAP
(What the world needs now.. is another PB solver.. like I need a
hole in my head)
ZAP happened by surprise. It had always been at the back of my mind that Something Should Be Done About DelPhi, the program written at Columbia University in the lab of Barry Honig by Barry, Kim Sharp, Mike Gilson, Shridhara Shridharan and me. While a very useful program, it was getting harder and harder to develop, partly because it was written in FORTRAN (not that there's anything wrong with that), partly because that's just the way academic code is. We don't know how to program but we want answers and we want them now.
ZAP really came about because of Andrew Grant. Andy was in the Scheraga lab struggling, as most of us do, with the nonlinear PB equation. He left to join AstraZeneca in 1993 but we kept in contact. In '95 he and Barry Pickup at the University of Sheffield published a remarkable paper [J. Comp. Chem. ]. They reported the hard-sphere volume of a molecule could be calculated to 0.1% accuracy by using atom-centered Gaussians. The correspondence between a discrete and a smooth, continuous function was nothing short of remarkable. Since the dielectric approach of PB is essentially volumetric because molecules are low-dielectric and solvent high, it occurred to us that this was the bedrock on which to build a new PB approach. That approach became ZAP.
There were several reasons to be excited about the Gaussian based approach. A smooth-function dielectric mapped onto a grid has few of the numerical problems plaguing DelPhi - instability with respect to grid placement, sensitivity to grid spacing. Also it is an interesting alternative physical model to most PB implementations. Typically it is assumed that the dielectric changes from molecular to solvent "discretely," or infinitely fast. Why? As there was no experimental information on the variation of dielectric it was the simplest assumption and also the simplest to implement. As is often the way with a successful scientific approach, these early decisions become ossified over time.
Another ossification is the choice of molecular surface as the dielectric threshold, as first proposed by BH [J. Phys. Chem]. It is not a bad choice. If a water molecule is excluded from some crevice of a molecule that ought to be reflected in a lower dielectric in the crevice: a molecular surface-delimited volume reflects this where an atomic surface does not. There was also a numerical issue. When Mike Gilson first applied DelPhi to proteins [PROTIENS: Vol 17..], the atomic approach would place grid points randomly in tiny solvent crevices. Moving the protein relative to the grid shuffled the grid representation of the protein interior leading to large, unphysical changes in energy. The molecular surface removed these crevices unless they were large enough to fit a whole water molecule, in which case a high dielectric assignment was not unreasonable.
In retrospect, however, the molecular surface is a terrible choice! It's difficult to calculate and unstable with respect to small displacements of atoms or to the choice of a radius for water. And there is always the question of what dielectric should be assigned to the volume that lies between the molecular and the atomic surfaces.
The Gaussian approach answers several of the physical objections. First and foremost, the dielectric varies smoothly from interior to exterior. Although we made no attempt to correlate this variation with empirical evidence, choosing instead to match our method to DelPhi energies, the variation of polarizability over a span of about one Angstrom is physically appealing. Secondly, we achieve much of the crevice exclusion that the molecular surface produces. Because the Gaussian functions spread out beyond the hard sphere radius of each atom, a crevice receives density contributions from all neighboring atoms, lowering its dielectric to molecular levels. Thirdly, we do not see water "pops", those large changes in dielectric from small atomic displacements (or, worse, small changes of the protein placement on the grid) that occur in active sites. The Gaussian effectively interpolates between water absence and presence.
An advantage to this approach that we did not anticipate
was its correspondence to DelPhi when applied to proteins. We parameterized
the dielectric variation in ZAP so that it agrees with DelPhi for small
molecules ("we" meaning Christine Kitchen's heroic work for her Ph.D. thesis
[in preparation: Go Christine!]). Given that the difference between
the molecular surface and Gaussian model will be much larger for proteins
with concavities, crevices and internal water pockets, we wondered how
well the two would agree. The remarkable finding was that ZAP applied
to proteins looks like DelPhi with twice the polarizability: ZAP with internal
dielectric set to 2.0 looks like DelPhi set to 4.0. Why is this useful?
Because there is circumstantial and theoretical evidence that 4.0 is a
good dielectric to use with DelPhi for proteins, but that always left a
loose thread in applying the method to protein-drug interactions
- small molecules should have a dielectric of 2.0, proteins 4.0, but DelPhi
only allows you to choose one internal dielectric. With ZAP it appears
you get the best of both worlds. Protein and drug are set to internal dielectrics
of 2.0 and because the protein is a little more "hydrated" in ZAP, it acts
like the dielectric is set to 4.0.
On the numerical side, every advantage we had hoped for
in ZAP came to pass. As mentioned in the previous section, we saw
faster convergence, remarkable stability with respect to grid placement
and much improved asymptotic behavior with decreasing grid spacing.
We also, for the first time, had gradients that could be applied in dynamics
or minimization.
And why was ZAP written as a toolkit?
Because Dave said so. Zap was my first attempt at writing functionality as a toolkit rather than as a command-line, file-driven program. Many have said it shows. The idea of toolkits is that you can both make specific executables and retain the flexibility to do new things not imagined by the creators. It differ from the library approach, the common practice of sharing functionality in compiled globs, in the nature of the interface to utility. Traditional libraries require you to link in header files that define the structure that captures the essence of a problem, such as a grid structure for grid manipulations. But structures change. One day the grid might be specified by its center and extents, the next by the corner points. They will play havoc with compatibility and reuse. In toolkits the underlying structures are hidden beneath the interface. Any change in a structure can be hidden from the user, and ignorance is bliss. There is another, less charitable, reason for hiding structures. A lot of thought (or lack thereof) may have gone into internal program organization and the designer may not feel like sharing this with the outside world. The toolkit interface is thus a professional way to obfuscate.