The Size of the Problem.
The size of chemical space is enormous – estimates vary from hundreds of billions1 to more than the number of atoms in the Universe.2 Until recently the size of accessible chemical space, molecules that have been or can easily be synthesized, has been much smaller, on the order of a few tens of millions. However, advances in the cheminformatics of molecular design have increased the size of accessible chemical space dramatically; the Enamine REAL collection3 of compounds with a high probability of synthesis contains over 1.4 billion stereo-defined molecules, and is growing, while fully enumerated virtual chemical spaces can contain 1015 molecules or more.4 Searching libraries numbered in the billions or larger is a significant challenge but provides substantial advantages for both structure-based5 and ligand-based4 methods. Accordingly large-scale virtual screening (LSVS) has become a problem of increasing interest in drug discovery.
Orion: Delivering scale easily and efficiently
Searching billions or tens of billions of molecules in 3D is computationally demanding. OpenEye’s cloud-native computational chemistry platform, Orion, provides the on-demand, fault-tolerant compute resources required for the intense, but irregular, demands of LSVS.
ROCS, OpenEye’s class-leading shape and chemical feature similarity tool6 is one of the most widely used tools in 3D lead discovery and lead hopping.7 The CPU implementation is fast, exceeding 200 molecules/CPU/second, while the GPU-enabled version ( FastROCS) is massively faster (700,000 molecules/GPU/second) but has identical overall performance in virtual screening (Figure 2). The high speed and performance of FastROCS make it ideally suited to LSVS at the billion molecule scale.
Figure 1: Speed of ROCS and FastROCS.
Figure 2: Virtual screening performance for ROCS and FastROCS on the DUD-E dataset.
Recent work from AstraZeneca4 illustrates the immense power of coupling FastROCS to the compute resources of the cloud through Orion. Searching more than 12 billion molecules requires only an hour, with automatic scale-up and scale-down. These very large scale searches produced more diverse, better scoring hits than searching smaller libraries (see Figure 3).4
Figure 3: Number of Bemis-Murcko scaffolds in the top 10,000 hits versus size of library searched.
FRED is carefully designed to balance high performance with high speed,8 allowing previously very large libraries to be handled with ease. Comparison with other tools, including HYBRID,9 the ligand-guided companion tool to FRED, shows that FRED performs very well in virtual screening (see Figure 4)
Figure 4: : Retrospective virtual screening performance for FRED and HYBRID, compared on the DUD set. Performance of a null model, 2D fingerprint, is shown for reference.
Harnessing the power of Orion to deliver the compute resources required by both FastROCS and FRED has brought previously technically or financially intractable searches within reach. As a proof of concept OpenEye has conducted both FastROCS and FRED virtual screens of the Enamine REAL database against heat shock protein 90, HSP-90 (PDB code 1UYG, (see Figure 3) and submitted the highest scoring molecules to biological testing.
Figure 5: Active site and cognate ligand of HSP-90, PDB code 1UYG.
Prospective ultra-large-scale virtual screening: FastROCS
FastROCS was able to search the Enamine REAL collection for molecules similar to the 1UYG ligand in less than 30 minutes. The most active hit (IC50 16 uM) and the query ligand (IC50 53.5 uM) are shown in Figure 6. FastROCS successfully identified a new chemical scaffold with slightly better activity than that of the query ligand.
Figure 6: 2D depictions of the 1UYG ligand (left), 53.5 uM and the most active hit from the FastROCS virtual screen (right), 16 uM.
Prospective ultra-large-scale virtual screening: FRED
FRED was used to dock the entire Enamine REAL collection to the 1UYG receptor in less than 24 hours on Orion, utilizing around 45,000 CPUs at peak capacity (Figure 7). To the best of our knowledge this is the largest scale docking study performed to date, almost an order of magnitude larger than the largest previous docking calculation (170 million compounds, reported by Lyu et al).5 The top-ranked molecule, also the most active hit (IC50 4 uM), is shown in Figure 8. The pose of this molecule recapitulates the key interactions of the cognate ligand, while occupying different regions of the binding site. FRED successfully identified a new chemical scaffold with substantially better activity than that of the cognate ligand.
Figure 7: Compute resources used by Orion to dock the Enamine REAL collection to HSP-90
Figure 5: Figure 8: The most active hit from the FRED virtual screen, 4 uM.
Orion is revolutionizing large-scale computing on the cloud, enabling virtual screens of unprecedented scale.
FastROCS can search multi-million molecule databases in seconds and databases of billions of molecules in less than an hour, delivering multiple actionable hits rapidly.
FRED has brought docking to the billion-molecule level, the first tool to do so.