2007-06 | EuroCUP I | Sheffield, UK

Sheffield, UK

June 21-22, 2007

Thursday, June 21

8:00 Registration / Tea, coffee, pastries
8:45 Welcoming remarks, Anthony Nicholls, OpenEye

Morning : Chemical information and modeling

9:00 Virtual Exploration of Chemical Space by Database Generation, Jean-Louis Reymond, University of Berne
9:40 Breaking the Language Barrier: Chemical Nomenclature around the Globe, Roger Sayle, OpenEye

10:10 Morning break

10:30 Using Pareto Methods to Evolve Structure-activity Relationships, Val Gillet, University of Sheffield
11:00 Communicating Chemical Information in a Research Organization, Joe Corkery, OpenEye
11:30 Further Adventures in Shape Space, Paul Hawkins, OpenEye

12:00 Lunch

13:15 OpenEye Product Review - Ligand-based design, Paul Hawkins, OpenEye

Afternoon : Structural information

14:15 Structure and Property Predictions for Molecular Organic Molecules: Progress and Blind Tests , Graemer Day, University of Cambridge
14:55 Computed Crystal Energy Landscapes for the Prediction and Understanding of Polymorphism, Sally Price, University College London

15:25 Afternoon break

15:45 Validating Crystallographic Ligands, Brian Kelley, OpenEye
16:15 Exploiting Ligand Conformations in Drug Design Jonas Bostroum, AstraZeneca
16:45 When Pigs Won't Fly..., Gerard Kleywegt, Uppsala University

17:30 Keynote lecture: Drug Discovery: Research or Process?, Dave Timms, AstraZeneca (retired)

19:00 Conference Dinner

Friday, June 22

8:15 Tea, coffee, pastries

Morning: Electrostatics

9:00 Piling Pelion on Ossa: the Tangled Web of Tautomer Preference, Peter Taylor, AstraZeneca (retired)
9:40 Five Years of Field Comparisons and Other Dangers of Letting Chemists & Physicists Talk, Geoff Skillman, OpenEye

10:10 Morning break

10:30 The Sheffield Solvation Model and Other Stunningly Simple Ideas to Improve Molecular Modeling, Andrew Grant, AstraZeneca
11:00 Protein Electrostatics, Function, and Environment, Jim Warwicker, University of Manchester
11:30 "Physically Correct Charges. Think You Use Them? Think Again., Anthony Nicholls, OpenEye

12:00 Concluding remarks, Anthony Nicholls, OpenEye

Close

Abstracts

Virtual Exploration of Chemical Space by Database Generation
Jean-Louis Reymond, University of Berne, Switzerland

Organic chemisty is the study of molecules made by forming covalent bonds between atoms of carbon, hydrogen, oxygen, nitrogen, halogens, and a few other elements (S, P, Si). The ensemble of all possible molecules forms the so-called chemical universe, or chemical space. Our aim is to explore chemical space in depth by exhaustive generation in silico, going beyond what nature and chemists have synthesized or imagined to date. We have generated a database of all molecules of C, N, O, F up to 11 atoms (H's are added to complement valency) possible under consideration of chemical stability and synthetic feasibility rules. The database illuminates the scope of organic chemistry and points to new chemotypes with promising predicted properties. We are also developing a strategy that goes beyond the 11 atom limit of exhaustive listing and explores chemical space systematically up to molecules of ca. 50 atoms.

Breaking the Language Barrier: Chemical Nomenclature around the Globe
Roger Sayle, OpenEye

The use of chemical compound names remains the primary method for conveying molecular structures between chemists and researchers. In research articles, patents, chemical catalogues, government legislation and textbooks, the use of IUPAC and traditional compound names is universal, despite efforts to introduce more machine-friendly representations such as identifiers and line notations. Fortunately, advances in computing power now allows chemical names to be parsed and generated (read and written) with almost the same ease as conventional connection tables. A significant complication, however, is that although the vast majority of chemistry uses English nomenclature, a significant fraction is in other languages. This complicates the task of filing and analyzing chemical patents, purchasing from compound vendors and text mining research articles or web pages. This talk describes the issues with manipulating chemical names in various languages, including British, American, Spanish, Swedish and Japanese, and describes the current state-of-the-art in tools to simplify the process.

Using Pareto Methods to Evolve Structure-Activity Relationships
Val Gillet, University of Sheffield

Classification methods have become popular tools in data mining, for example, in the analysis of high-throughput screening data where the aim is derive a model that is able to separate active from inactive compounds. Such binary classifiers actually have two implicit objective functions: specificity (which describes how well they classify the active molecules) and sensitivity (which describes how well they classify the inactives). In general, it is not possible to simultaneously improve on both measures and so the usual approach to training classifiers is to combine the two measures into a single objective function such as the F-measure (which represents the harmonic mean of recall (R) and precision (P)). This approach provides a balanced solution, however, the user has limited control on the model that is generated. We have used Pareto methods to optimise recall and precision independently. The effect is to generate solutions on a recall-precision curve (equivalent to a ROC curve) where each solution represents a different model. The user may then select a model appropriate for their particular application. For example, a model with high precision but low recall may be preferred when deriving structure-activity information, whereas, a high recall but low precision model may be more useful for virtual screening. The Pareto approach has been extended to derive multiple models which, when taken together, improve on the classification rate achieved by a single model and which enable a dataset to be described by more than one structure-activity relationship.

Communicating Chemical Information in a Research Organization
Joe Corkery, OpenEye

Pharmaceutical and biotech research organizations are relatively heterogeneous, comprised of many distinct groups of users with differing expectations, experience, methodologies, priorities, and even IT infrastructure. The communication of chemical information between these groups is critical to the discovery process but is often limited by these constraints. Chemical information refers to not only chemical structures, but also to their associated "meta" information such as prioritization, hit lists, user annotations, as well as calculated and measured properties. Addressing this challenge has required extensive research, experimentation and innovation on many fronts including navigating the heterogeneity of both hardware and the user, as well as the many varying methodologies of data communication and presentation. We have attempted to tackle these issues with the development of VIDA and Vivant. VIDA addresses the challenges of heterogeneity by providing the same easy-to-use intuitive interface to a powerful visualization application across multiple platforms (Microsoft Windows, Linux/Unix, and Mac OS X). The interface takes advantage of widely accepted interface paradigms in order to focus on use by many different types of users instead of just a narrowly defined subset of specialists. Vivant addresses the challenges of communication by specifically leveraging the established power of Microsoft PowerPoint and the World Wide Web as communication mediums in order to present not only the chemical structures (as visualized in VIDA) but also the "meta" information generated for them based on user interactions with those compounds. Given the resources now available, do they make a difference? Potential tests will be discussed.

Further Adventures in Shape Space
Paul Hawkins, OpenEye

It has been a long-held assumption in ligand-based virtual screening that the bioactive conformation of a molecule is privileged and superior performance should be obtained when using such a conformation. A parallel assumption has been that extensive sampling of the conformational space of database molecules is necessary to obtain optimal performance. Both of these assumptions will be critically assessed in the case when ligand-based virtual screening is carried out in shape space.

Structure and Property Predictions for Molecular Organic Molecules: Progress and Blind Tests
Graemer Day, University of Cambridge

Computational methods for the prediciton of molecular organic crystal structures have been developing rapidly over the past 2 decades and many promising results have been reported. Evaluations of current methods for structure prediction are presented, including blind tests of crystal structure prediciton. Three such blind tests have been run over the past 8 years, as an objective test of current capabilities; the organisation and results of these are presented. As well as the ambitious aim of structure prediction, methods have been improving for the prediciton and understanding of physical properties (including mechanical and dynamic properties) of organic crystals with known structure; these methods, and their performance, are also discussed.

Computed Crystal Energy Landscapes for the Prediction and Understanding of Polymorphism
Sally Price, University College London

Contrasting the thermodynamically feasible crystal structures with those that are experimentally known, can either confirm that an experimental polymorph screen is effectively complete, or provide structures to target by other crystallisation methods and generally help rationalise polymorph and solvate formation. This will be illustrated by a range of examples, including cases of prior prediction of polymorphs, to show how computational modelling is progressing in the range of molecules for which useful crystal energy landscapes can be computed.

Validating Crystallographic Ligands
Brian Kelley, Jim Nettles, and Greg Warren, OpenEye

A daunting task for computational chemists is the validation of crystallographic complexes. Data quality used to determine structures can be quite poor forcing a decision of whether to use the current low quality model or to invest in attempts to collect higher quality data. Here, we present a collection of techniques used by crystallographers and modelers to validate structures. Crystallographic techniques include real space correlation[1] and Difference of Difference (DOD) [2] method. These techniques are juxtaposed with techniques familiar to computational chemists such as contact surfaces, strain energy and clash potentials. Finally we show, using examples from the RSCB[3], that in many cases the initial poor ligand structure can be replaced with a chemically sensible and low-strain conformation that is equivalently fit to the crystallographic data.

[1] Jones, T.A. Acta. Cryst , 1991, A47, 110-119.
[2] Nettles, J. H., Science 2004, 305, 866-869
[3] Perola, E.; Charifson, P. S. J. Med. Chem. 2004, 47, 2499-2510.

Exploiting Ligand Conformations in Drug Design
Jonas Boström, AstraZeneca

Drug discovery requires the analysis of increasing amounts of data. Computational chemistry plays a role in organizing and interpreting such data with the goal of making predictions. More specifically molecular design involves coupling predictions about how modifications to molecular structure are manifested in terms of changes to experimental properties. The dominant methods of establishing similarities between molecules and their relationships to measured data are based on the local connectivity of atoms, and as such reflect how molecules have been historically drawn in medicinal chemistry. A better physically founded approach to molecular modeling is to consider the three dimensional (3D) shape of molecules. However, it has proved difficult to fully realize all of the potential advantages of 3D based techniques, in part due to the difficulty of obtaining good descriptions of the conformations of molecules. A particular challenge is in determining the conformation of a molecule when it is bound to a given biological target, the so-called bioactive conformation. Despite the numerous approximations involved in both modeling the energetics of conformers and the methodologies involved in searching conformational space, it has been shown that the insights from conformational analysis can contribute to drug discovery projects. This will, for example, be illustrated by the design of a novel class of cannabinoid (CB1) receptor antagonists for the treatment of obesity.

When Pigs Won't Fly...
Gerard Kleywegt, Uppsala University

Once upon a time, in the land of the unicorn, where every rainbow had a pot of gold at both ends, where everybody was beautiful, healthy and happy, and where pigs could fly, all protein crystallographers were brilliant scientists who never made any mistakes.

Meanwhile, back on Earth, the situation is slightly less favourable. As the recent debacle with the ABC transporter structures has (once again!) shown, the mere fact that something (say, a hot crystal structure) is published in Nature or Science is uncorrelated to its correctness. A few examples of the mistakes that crystallographers can make will be given, and simple ways in which non-experts can assess the overall reliability of structures will be discussed.

However, even when the overall structure is reliable, some details may not be (much like Orwell's pigs, all residues are equal, but some residues are more equal than others). Of particular interest in this respect are the protein residues that interact with substrates or inhibitors and the interacting molecules themselves. It will be shown (with some amusing examples) that the structures of these "size-challenged" molecules when determined in complex with biomacromolecules are at times rather less reliable than those of their big bio brethren.

Obviously, these observations have important implications for users of protein structures. For one thing, these structures should be treated with healthy scepticism rather than reverence. The best way to assess the reliability of critical aspects of a structure (ligand, active-site residues, metal-binding site, interface residues, etc.) is to inspect the experimental electron density maps. Our efforts to provide such maps, and information derived from them, for all crystal structures for which structure factors have been deposited with the PDB will be discussed. They have resulted in EDS - the Uppsala Electron Density Server (http://eds.bmc.uu.se/).

If time permits, a new tripendicular server called ValLigURL (which is, like, totally pronounced "Valley Girl" - duh!) will also be described. It can be used to answer various totally sluggin' questions about ligand conformations in macromolecular crystal structures, including:

Has this conformation of my favourite ligand, like, been observed before in the PDB?
Which PDB entries contain my favourite ligand but in a, like, way different conformation (somebody - gag me with a spoon!)?
How do the bond lengths and angles of, like, my favourite ligand compare to the ideal values (taken from MSDchem) or to the values found in the same ligand in other gnarly PDB entries?
Which tubular PDB entry contains a copy of my favourite ligand that has the best bond lengths and angles (so that I can, like, totally use that entry as a starting point for my modelling work)?

Piling Pelion on Ossa: the Tangled Web of Tautomer Preference
Peter Taylor, AstraZeneca (retired)

Despite the mass of experimental information that has accumulated on tautomerism over the past half century, there has been no serious attempt so far to systematise the data in the interests of a predictive methodology. Since tautomer preference is phase dependent the choice of standard phase is crucial, and we opt for water as the universal biological medium. We shall demonstrate that removing the distortions entailed in the inevitable use of model compounds can result in such improved internal consistency that hitherto undetected correlations become visible. We go on to analyse one of many cases is which the use of additive assumptions has allowed the identification and quantification of an important perturbing factor whose use can aid prediction in a wide variety of contexts.

Five Years of Field Comparisons and Other Dangers of Letting Chemists & Physicists Talk
Geoff Skillman, OpenEye

Molecular similarity, whether computational or in the imagination of a chemist, has a long and distinguished history of success in medicinal chemistry. In the 1980’s and 1990’s, a chemistry perspective, in which atom typing played a prominent role, dominated both 2D and 3D similarity approaches (e.g. MACCS, Catalyst). Coulombic-field similarity was also introduced in the early 1990’s (e.g. CoMFA) and Coulombic fields were later reduced to atom-type related field points that allowed easier alignment and comparison (e.g. Cresset). In 2002 OpenEye introduced fast physics-based methods to both align and directly compare electrostatic fields of small molecules in water (Anthony Nicholls, Cup II). Here we will present our experience over the past five years and show that careful consideration of chemistry details such as variable resolution conformer exploration, high quality partial charges, proper ionization and tautomer states, physiologic salt concentrations and a justifiable interaction model are both necessary and sufficient to achieve outstanding performance from solvated electrostatic-field comparisons.

Sheffield Solvation Model and Other Stunningly Simple Ideas to Improve Molecular Modeling
Andrew Grant, AstraZeneca

There are two ways to model a solute in water. Choose a force-field, apply massive computing power and wait a long time. Or choose an implicit solvent model, enlist any computer made in the last thirty years and get an answer potentially every bit as good in a fraction of the time. How small a fraction? Typically a million-fold or more. Given the enormous difference in speed between explicit and implicit models, is there really any need for yet faster implicit solvent models? This talk will suggest there is. The Sheffield Solvation model, named for its point of origin, is dramatically faster than other implicit solvent models, allowing, for example, on-the-fly calculation of conformational and vibrational entropies, important components often ignored in physical property prediction. In addition it is competitive in accuracy and trivial to implement with simple, analytic gradients. The historical development of the model will be presented, such as aspects of the more substantial model from which it was derived and also the light it sheds on the over-parameterisation of certain other implicit solvent models. Finally, simple ways will be suggested to extend Sheffield to achieve greater accuracy and extensibility.

Protein Electrostatics, Function, and Environment
Jim Warwicker, University of Manchester

We develop methods to calculate pH-dependent properties for biological molecules, finding that a combination of Finite Difference Poisson-Boltzmann and Debye-Huckel provides a good and relatively fast approximation in many cases. We look for correlations between calculations/predictions and environmental properties, such as subcellular location. Revisiting the widely discussed topic of charges and thermostability, although we predict greater stabilisation from ionisable groups for hyperthermophile proteins, this does not correlate with the numbers of such groups. Since these numbers are also larger for hypothermophile proteins, we infer that at least two effects are in operation (stability and solubility). Shape-based protein electrostatics has seen a minor revival in the era of Structural Genomics, from addressing issues such as predictions of enzyme/non-enzyme or nucleic acid-binding. These are relatively trivial implementations of the complexity implicit in detailed charge/shape analysis. A more lasting contribution would be the development of algorithms to match 3D structures (proteins etc) to detailed functional classes, at least for enzymes.

Physically Correct Charges. Think You Use Them? Think Again
Anthony Nicholls, OpenEye

The point charges we are familiar with in modeling are:
(a) under-determined and not physically observable
(b) try to account for both electrostatic potentials and polarization,
the latter through the ubiquitous application of over-polarized basis sets popularized by the likes of Kollman, Karplus and others. I will illustrate why charges can not do both but that the use of an actual physical observable, the dielectric constant, resolves matters, predicts a new and reasonable constant for solvated systems, unifies a wide range of charge and uncharged systems, solves some old problems, such as halide transfer energies, while posing interesting new conundrums.