5.2 Ways to deal with OEBestOverlay results

Unlike OEOverlap and OEColorOverlap, OEBestOverlay uses multi-conformer molecules as the reference and fit (mostly for efficiency). As such, a single OEBestOverlay calculation can return numerous results. Basically, each calculation will return N results where N is the number of ref conformers times the number of fit conformers times the number of starting positions for each pair. So comparing 2 molecules with 10 conformers each could return 400 or more results.

There are two helper classes designed solely to contain these results and to make it easy to extract all or just the desired subset. OEBestOverlayResults (section ) holds the results of a single pair of conformers. It contains the set of OEBestOverlayScore objects (section ), one for each starting position.

The first example, uses 2 iterators to show all the results.

 1 #!/usr/bin/env python
 2 # Copyright (C) 2005,2006 OpenEye Scientific Software, Inc.
 3 import os, sys
 4 from openeye.oechem import *
 5 from openeye.oeshape import *
 6
 7 if len(sys.argv)!=3:
 8     OEThrow.Usage("bestoverlay1.py <reffile> <rocs_hits_file>")
 9
10 reffs = oemolistream(sys.argv[1])
11 fitfs = oemolistream(sys.argv[2])
12
13 refmol=OEMol()
14 OEReadMolecule(reffs, refmol)
15
16 best=OEBestOverlay()
17 best.SetRefMol(refmol)
18
19 print "Ref. Title:", refmol.GetTitle(),
20 print "Num Confs:", refmol.NumConfs()
21
22 resCount = 0
23 fitmol = OEMol()
24 while OEReadMolecule(fitfs, fitmol):
25     print "Fit Title:", fitmol.GetTitle(),
26     print "Num Confs:", fitmol.NumConfs()
27
28     for res in best.Overlay(fitmol):
29         for score in res.GetScores():
30             print "FitConfIdx: %-4d"%score.fitconfidx,
31             print "RefConfIdx: %-4d"%score.refconfidx,
32             print "Tanimoto: %.2f"%score.tanimoto
33             resCount +=1
34
35 print resCount,"results returned"

Listing:5.1 Getting all the scores from OEBestOverlay.

But in most cases, one does not want or need all the results. Most times, the single best overlap for each conformer-conformer pair is desired. The next example shows the OESortOverlayScores function used to turn the double iterator as show in the example above into a single iterator of OEBestOverlayScores. Note that the third argument is a functor used to sort the list, such that in this next example, we get one OEBestOverlayScore for each pair of conformers and they are returned in Tanimoto order.

 1 #!/usr/bin/env python
 2 # Copyright (C) 2005,2006 OpenEye Scientific Software, Inc.
 3 import os, sys
 4 from openeye.oechem import *
 5 from openeye.oeshape import *
 6
 7 if len(sys.argv)!=3:
 8     OEThrow.Usage("bestoverlay2.py <reffile> <rocs_hits_file>")
 9
10 reffs = oemolistream(sys.argv[1])
11 fitfs = oemolistream(sys.argv[2])
12
13 refmol=OEMol()
14 OEReadMolecule(reffs, refmol)
15
16 best=OEBestOverlay()
17 best.SetRefMol(refmol)
18
19 print "Ref. Title:", refmol.GetTitle(),
20 print "Num Confs:", refmol.NumConfs()
21
22 resCount = 0
23 fitmol = OEMol()
24 while OEReadMolecule(fitfs, fitmol):
25     print "Fit Title:", fitmol.GetTitle(),
26     print "Num Confs:", fitmol.NumConfs()
27
28     scoreiter = OEBestOverlayScoreIter()
29     OESortOverlayScores(scoreiter, best.Overlay(fitmol), OEHighestTanimoto())
30     for score in scoreiter:
31         print "FitConfIdx: %-4d"%score.fitconfidx,
32         print "RefConfIdx: %-4d"%score.refconfidx,
33         print "Tanimoto: %.2f"%score.tanimoto
34         resCount +=1
35
36 print resCount,"results returned"

Listing:5.2 Getting all the best scores from OEBestOverlay.

The next example is a slight variation on the previous. It adds a user parameter for the number of results to keep. Since the results are sorted by Tanimoto, keeping the first 5, keeps the best 5.

 1 #!/usr/bin/env python
 2 # Copyright (C) 2005,2006 OpenEye Scientific Software, Inc.
 3 import os, sys
 4 from openeye.oechem import *
 5 from openeye.oeshape import *
 6
 7 if len(sys.argv)!=4:
 8     OEThrow.Usage("bestoverlay3.py <reffile> <rocs_hits_file> <keepsize>")
 9
10 reffs = oemolistream(sys.argv[1])
11 fitfs = oemolistream(sys.argv[2])
12 keepsize = int(sys.argv[3])
13
14 refmol=OEMol()
15 OEReadMolecule(reffs, refmol)
16
17 best=OEBestOverlay()
18 best.SetRefMol(refmol)
19
20 print "Ref. Title:", refmol.GetTitle(),
21 print "Num Confs:", refmol.NumConfs()
22
23 resCount = 0
24 fitmol = OEMol()
25 while OEReadMolecule(fitfs, fitmol):
26     print "Fit Title:", fitmol.GetTitle(),
27     print "Num Confs:", fitmol.NumConfs()
28
29     scoreiter = OEBestOverlayScoreIter()
30     OESortOverlayScores(scoreiter, best.Overlay(fitmol), OEHighestTanimoto())
31     for score in scoreiter:
32         print "FitConfIdx: %-4d"%score.fitconfidx,
33         print "RefConfIdx: %-4d"%score.refconfidx,
34         print "Tanimoto: %.2f"%score.tanimoto
35         resCount +=1
36         if resCount==keepsize:
37             break
38
39 print resCount,"results returned"

Listing:5.3 Keeping a few top scores from OEBestOverlay.