OpenEye Scientific is now part of Cadence

Ant's Rant - The First of Many

Ant's Rant - The First of Many

There’s been a lot of hot air blowing around lately about Bill Gates’s decision to invest $10M in our friends at Schrodinger. Some have even suggested that this might mean a return to the heady days of the ’80s when the belief that computation was about to make drug design “rational” was in the air. (I always felt this must have been taken as quite an insult by those who had created a very successful industry without computers: “yeah, nice job, but you didn’t do it rationally”). Of course, these latest high hopes are as silly as those common in that now 30-year-old “golden age.” This is a reaction that large sums of money seems to inspire: “oh, we are investing billions to cure cancer/end the drug war/bring peace to the Middle East, so surely that’s that, time to move on to the next insurmountable problem.” In many ways this kind of thinking is simply a facet of the endearing optimism of Americans that I like. And some times it even works: they did get to the moon after all.

Much of the starry-eyed enthusiasm over Mr. Gates’s gift concerns the hope that this money will make it possible to finally calculate the interaction energy (the “free energy”) between a ligand and a protein, a triumph that would at least begin to pave the way to the rational selection of ligands. This is, of course, presuming we have a protein structure—quite an assumption, since theorists have still not quite solved that last big challenge, i.e., protein folding. (Anyone who worked on lattice models of protein folding is free to tell me how wrong I am!) The dominant view is that the only way to achieve this calculation is through molecular dynamics—that is, moving atoms about under a thermal randomness. The belief here is that with the right force field and a long enough simulation the right answer must pop out. “Brainlessly,” as my friend Christopher Bayly likes to say. There are several things wrong with this belief, some fundamental, some practical, some sociological.

The fundamental problem is that there’s no guarantee at all that such a force field could ever be constructed. The difficulty with including that most basic of components of molecular interaction—polarization—illustrates what I mean. Even the best attempts have flaws, either through their models or through their implementation. For instance, point inducible dipoles don’t polarize when the field at that point is zero, despite the fact that a non-uniform field can quite easily be non-zero (and polarizing) away from that point. Or, the implementation can require so many parameters as to be useless except for (useless) retrospective analysis. Or there just don’t exist the terms necessary for capturing reality accurately enough. Force fields are nearly always point-wise additive, but nature isn’t. Kennie Merz makes a nice point when he argues that even a simple interaction is difficult to get right with quantum mechanics (e.g., to ~0.5 kcal/mol); even if the errors in the interaction between parts of a ligand and a protein are statistically uncorrelated, this argues for an expected error of 2.5 - 4 kcal/mol. And this is using the QM that the force fields are attempting to mimic. So it is not unreasonable to wonder if a force field can be constructed to give accurate measurements, at least for absolute binding energies. Of course, some will train force fields to give reasonable numbers for some systems—but is this a general solution? Unlikely, I think.

The practical question is: how long a simulation is long enough? Simulationists are continuing to (re)discover that what they thought of as complete and utter sampling is actually… not. A simple example in ligand binding is whether all conformations in the unbound (or even bound) state have been sampled. There are often high energy barriers to conformer transitions that simulation is unsuited for. Of course, you could just start from an ensemble of predetermined states and combine the simulations, but that isn’t exactly “brainless,” is it? The essential practical problem of dynamics is that it’s not a very rational way to sample phase space—and for thermodynamic properties, that’s what counts. Sure, for kinetic properties molecular dynamics might be very useful (an unproven conjecture but let’s imagine), but for the thermodynamic properties we want, like binding energies, it’s just a plain dumb way to sample. Yes, there are more rational ways to sample that do use MD as a component, such as normal mode analysis or simulated annealing, and it is possible that for local mode sampling MD is efficient. I’ve yet to see anyone prove that, but again, it is possible. And efficiency is important here. People like to point out the inexorable march of silicon and how today’s time scales of months will someday be days, then hours, but what I think this misses is that all the other approaches out there—current and yet to be invented—will benefit just as much. In the end, an inefficient but accurate method has only one purpose: calibration for the efficient and almost-as-accurate. At least, if you are in business and not in the splendid ivory towers of academia.

Finally, the sociological problems are considerable. Because the belief is that MD does in some ways “simulate reality,” it is continually proposed as the “physics” solution, usually by people with little or no training in physics. I often think this is meant to be vaguely intimidating because, gosh darn it, physicists are so smart. Well, some are, some aren’t, but that’s not what made and makes physics great. What has characterized the ascendancy of physics as a predictive and powerful science is three things: (1) the insistence that its models that are understandable and of an appropriate complexity to the problem, (2) tight collaboration with experimentalists willing to disprove a theory, and (3) a predilection for generality over domain problem solving, leading to consilience (i.e., if this is right, what else do I now know, understand or have the ability to ask questions about?). People might argue with me about this definition; I won’t attempt to defend it, other than to say that it has come about from several years of consideration.

Where, you might ask, is mathematics—the word that most people associate with physics—in all this? Mathematics is often a consequence of this approach, as in the “unreasonable effectiveness of mathematics” first noted by Galileo, but to think it is intrinsic to physics is to mistake cause and effect. In fact, some of the worst physics (think string theory) is extremely mathematical, and some of the best (think material science) is often not. And mathematics is too often used to hide the paucity of physics in an model as to support it. The reason understandability is important is that it allows for the above-mentioned characteristic (2): e.g, the more you understand a model, the simpler it is to think of how it might fail and to test accordingly. Finally, a good idea inevitably explains more than you intended; in fact, that is usually how we know an idea is a good idea. I think MD fails all these criteria. It is more complex than it needs to be—not just because it calculates kinetic data to get to thermodynamics, but because simple models like PB consistently do as well or better. It seldom helps you understand why the result is the way it is—this is the “brainless” part Chris describes—it just gives “an answer.” It does not engender interactions with experimentalists—there is no tradition of the back-and-forth between theory and experiment that drives scientific fields forward. Finally, there is no sense of generalization, although the MD world would say this is nonsense because it is the most general of techniques. Not so. Dynamics is constantly tuned to the purpose at hand, be it protein folding, ligand binding, polymer melts, conformational change, liquid simulation or whatever. It is an intensely non-generalizable method. Yet the hubris that accompanies MD is often stunning.

I think physics does have a role to play in drug discovery. Call me heretical, but I think it is inevitable that we will eventually provide at least hints and clues to protein-ligand interaction from a physics-based approach that does not include MD, other than perhaps as a local sampling technique. OpenEye is very committed to that path and, through the scientific process, will see progress. But it won’t be because of $10M from a hopeful billionaire. As we were reminded at CUP XI, hope is not a strategy, but the scientific process of small steps, imaginative thinking, increased interactions with experimentalists and perseverance is. My prediction, which you may hold me to, is for usefulness within five years. Set your digital watches.

Preserved Comments

this is the “brainless” part Chris describes—it just gives “an answer.” It does not engender interactions with experimentalists—there is no tradition of the back-and-forth between theory and experiment that drives scientific fields forward. Finally, there is no sense of generalization, although the MD world would say this is nonsense because it is the most general of techniques- Top Listings

I love reading your blog..And I'm amazed with the information that's new for me.
Thanks a lot Anthony.. - Nickets

OK. You're heretical. Fortunately, in this century I think this won't get you burned at the stake!

Here's what Shrodinger is going to do with the money (excerpted from:
"First, additional resources will be allocated to existing projects that we view as critical, such as induced fit docking and scoring function development. Next, we will create novel tools for ADME prediction and modeling GPCRs, both of which will require major new research efforts with a long term commitment. Finally, we intend to develop a database of protein-ligand interactions, which will represent a significant advancement over currently available commercial products."

Little described above can actually be described as science unless they are going to do some experiments....

I believe that physics does probably hold the answer. However, in the 'old physics' ways, there was a competition between the experimentalists and the theoreticians. Right now, at least in the "practical" world of drug discovery, the game isn't even close, so the experimentalists generally aren't worried about challenging what the theoreticians say, since A) They lose so infrequently and B) Regardless of the "sophistication" of any model, only the experiment will be used to make a high stake decision. I hope the biotech/drug industry can get better at providing more meaningful data in the future.

Until physics allows us to see 'the answer', demonstrable usefulness is a noble goal. Good Luck! (you have 1770 days left) - CSage

I believe Physics to be the queen of all sciences! - Custom Essay

I couldn't agree more. $10M would buy a lot of useful experiments. SAMPL3 (August 1st 2011) does have some real, live experimental data not yet published- but even this is not the back and forth between experiment and theory that the field really needs. 1768 days to go. - Ant

What's the matter with you people? I *know* you don't (all) agree with me. I really won't hunt you down. Probably. - Ant

I believe a good idea explains what you intended it to.

It is a good idea to get enough rest each night. It explains why we can function well on a daily basis. (of course other factors need be satisfied as well, but getting no sleep for three straight nights will not allow you to function well). Getting enough sleep may *also* explain more beneficial advantages. But even if it doesn't that doesn't mean it's a bad idea.

It's a good idea to eat too. If we don't we will die.

Running MD is often a good idea when I need limited sampling and can trust it to do a good job of say allowing sidechain exploration in the local environment of a protein/ligand complex. I can trust the results it generates more than most of the rotamer library databases which know nothing of the local environment. MD is not necesarily better than MC but it does work well. And so it is a good idea.

Not all good ideas are the best ideas. And not all good ideas have to explain more than they are intended to.

For a given sidechain motion, there can be other better methods for sampling. But as long as MD gives me reasonable results in a matter of 1-2 minutes it is good enough. Limited MD in my project support has often prospectively predicted previously unseen protein loop and sidechain movement, with subsequent corroboration by crystallography. (And even in cases where there has not yet been experimental validation, the ideas are still useful. If you play the lottery and win $1000 for every $100 spent and lose 90% of the time, it's still a good financial decision for you to buy those tickets).

And the same MD has prospectively predicted accurate folding times for small proteins. Is that not generalisable?
What percent of instances does it need to work in for you to call it generalisable?
Generalisability should not mean it has to succeed in every single case.

You've defined your ideology of what a good idea is. And maybe MD does not meet those criteria. I submit that you are too restrictive on what makes an idea (or method) good.