The following four examples perform the same task:
In Listing 1, after importing the query structure and generating its path fingerprint, the program loops over the database file creating a path fingerprint for each structure. Then the program calculates the Tanimoto similarity between the fingerprint of the query and the database entry by calling the OETanimoto function.
Listing 1: Similarity calculation from file
#!/usr/bin/env python
import sys
from openeye.oechem import *
from openeye.oegraphsim import *
if len(sys.argv) != 3:
OEThrow.Usage("%s <queryfile> <targetfile>" % sys.argv[0])
ifs = oemolistream()
if not ifs.open(sys.argv[1]):
OEThrow.Fatal("Unable to open %s for reading" % sys.argv[1])
qmol = OEGraphMol()
if not OEReadMolecule(ifs, qmol):
OEThrow.Fatal("Unable to read query molecule")
qfp = OEFingerPrint()
OEMakeFP(qfp, qmol, OEFPType_Path)
if not ifs.open(sys.argv[2]):
OEThrow.Fatal("Unable to open %s for reading" % sys.argv[2])
tfp = OEFingerPrint()
for tmol in ifs.GetOEGraphMols():
OEMakeFP(tfp, tmol, OEFPType_Path)
print "%.3f" % OETanimoto(qfp, tfp)
In Listing 2 only the code block that is different from Listing 1 is shown. In this example, it is assumed that the fingerprints are pre-calculated and stored in a OEB binary file as generic data attached to the corresponding molecules. The program loops over the file and accesses the pre-generated fingerprints or calculates them if they are not available.
The obvious advantage of this process, that the fingerprints have to be generated only once when the binary file is created. This can be significantly faster, than generating the fingerprints on-the-fly every time the program is executed.
See also
The Storage and Retrieval section shows an example of how to generate an OEB binary file which stores molecule along with their corresponding fingerprints.
Listing 2: Similarity calculation from OEB file
tfp = OEFingerPrint()
for tmol in ifs.GetOEGraphMols():
if tmol.HasData("PATH_FP"):
tfp = tmol.GetData("PATH_FP")
else:
OEThrow.Warning("Unable to access fingerprint for %s" % tmol.GetTitle())
OEMakeFP(tfp, tmol, OEFPType_Path)
print "%.3f" % OETanimoto(qfp, tfp)
Listing 3 differs from Listing 1 in that it uses an OEFPDatabase object to store the generated fingerprints. The OEFPDatabase class is designed to perform in-memory fingerprint searches.
Listing 3: Similarity calculation with fingerprint database from file
fpdb = OEFPDatabase(qfp.GetFPTypeBase())
for tmol in ifs.GetOEGraphMols():
fpdb.AddFP(tmol)
for score in fpdb.GetScores(qfp):
print "%.3f" % score.GetScore()
After building the fingerprint database, the scores can be accessed by the OEFPDatabase.GetScores method. This will return an iterator over the similarity scores calculated.
Note
The OEFPDatabase only stores fingerprints and not the molecules from which they are generated. A correspondence between a molecule and its fingerprint stored in the database can be established by using the index returned by the OEFPDatabase.AddFP method.
See also
Listing 5 shows how to keep track of the correspondence between a fingerprint added to a OEFPDatabase object and a molecule from which it is calculated.
In the last example, OEFPDatabase is used again to store the fingerprints. If the fingerprint is read from the OEB input binary file, then it is directly added to the database, otherwise the fingerprint is generated on-the-fly when passing the OEMolBase molecule itself to the OEFPDatabase.AddFP method.
Listing 4: Similarity calculation with fingerprint database from OEB
fpdb = OEFPDatabase(qfp.GetFPTypeBase())
for tmol in ifs.GetOEGraphMols():
if tmol.HasData("PATH_FP"):
tfp = tmol.GetData("PATH_FP")
fpdb.AddFP(tfp)
else:
OEThrow.Warning("Unable to access fingerprint for %s" % tmol.GetTitle())
fpdb.AddFP(tmol)
for score in fpdb.GetScores(qfp):
print "%.3f" % score.GetScore()
Similarity searching based on 2D representation of molecular structure (such as fingerprints) is one of the most common approaches for virtual screening. A molecule that is structurally similar to an active molecule is more likely to be active.
Virtual screening strategy involves going through a molecule database and calculating the similarity between a reference structure and each of the molecules, followed by ranking the similarity scores in descending order to identify molecules that are the most similar to the reference structure.
Listing 5 shows how to search a molecule database using the OEFPDatabase.GetSortedScores method to identify analogs based on their fingerprint similarity. After importing molecules and inserting their fingerprints into an OEFPDatabase database, the program reads reference molecules (in the SMILES format) from standard input. After parsing the SMILES string, the fingerprint database is searched to identify structures with the highest similarity scores. Finally, the SMILES string of the best hits are written to standard output.
Listing 5: Similarity search in memory
#!/usr/bin/env python
import sys
from openeye.oechem import *
from openeye.oegraphsim import *
def LoadDatabase(fname, fpdb, mollist):
ifs = oemolistream(fname)
for mol in ifs.GetOEGraphMols():
fpdb.AddFP(mol)
mollist.append(OEGraphMol(mol))
if len(sys.argv) != 2:
OEThrow.Usage("%s <database>" % sys.argv[0])
ifs = oemolistream()
if not ifs.open(sys.argv[1]):
OEThrow.Fatal("Unable to open %s for reading" % sys.argv[1])
mollist = []
fpdb = OEFPDatabase(OEFPType_Path)
LoadDatabase(sys.argv[1], fpdb, mollist)
# Read SMILES from stdin
sw = OEWallTimer()
query = OEGraphMol()
while True:
sys.stdout.write("Enter SMILES> ")
line = sys.stdin.readline()
line = line.rstrip()
if len(line) == 0:
sys.exit(0)
query.Clear()
if not OEParseSmiles(query, line):
OEThrow.Warning("Invalid SMILES string")
continue
sw.Start()
scores = fpdb.GetSortedScores(query, 5)
OEThrow.Info("%f seconds to search %i fingerprints" % (sw.Elapsed(), len(mollist)))
for si in scores:
hit = mollist[si.GetIdx()]
smiles = OECreateIsoSmiString(hit)
OEThrow.Info("Tanimoto score %4.3f %s" % (si.GetScore(), smiles))
As mentioned before, OEFPDatabase is a fingerprint container and does not store the corresponding molecule. Listing 5 therefore stores molecules in a separate container. When a fingerprint is added to the OEFPDatabase, its corresponding molecule is inserted into this container. When the OEFPDatabase.GetSortedScores returns the iterator over the best similarity scores, the associated index can be utilized to access the corresponding structure.
In the above example, the entire database was searched to identify structurally similar molecules. However, the user can also specify a segment of the database to be searched but providing a begin and end index.
See also
Examples of fingerprint searches in the API section:
By default, the Tanimoto similarity is used when calling either the OEFPDatabase.GetScores method or the OEFPDatabase.GetSortedScores method. The user can set other types of similarity measures to be applied by calling the OEFPDatabase.SetSimFunc method with a value from the OESimMeasure namespace. Each of the constants from this namespace corresponds to one of the built-in similarity calculation methods.
There is also a facility to use user-defined similarity measures when searching a fingerprint database. The following example shows how a similarity calculation can be implemented by deriving from the OESimFuncBase class.
Formula: 
class SimpsonSimFunc(OESimFuncBase):
def __call__(self, fpA, fpB):
onlyA, onlyB, bothAB, neitherAB = OEGetBitCounts(fpA, fpB)
if onlyA + onlyB == 0:
return 1.0
if bothAB == 0:
return 0.0
sim = float(bothAB)
sim /= min(float(onlyA+bothAB), float(onlyB+bothAB))
return sim
def GetSimTypeString():
return "Simpson"
def CreateCopy(self):
return SimpsonSimFunc().__disown__()
After implementing the similarity calculation, it can be added to an OEFPDatabase object, henceforth this new similarity calculation will be used.
fpdb = OEFPDatabase(OEFPType_Path)
fpdb.SetSimFunc(SimpsonSimFunc())
See also