The basic idea underlying similarity-based measures is that molecules that are structurally similar are likely to have similar properties. In a fingerprint the presence or absence of a structural fragment is represented by the presence or absence of a set bit. This means that two molecules are judged as being similar if they have a large number of bits in common.
Measuring molecular similarity or dissimilarity has two basic components: the representation of molecular characteristics (such as fingerprints) and the similarity coefficient that is used to quantify the degree of resemblance between two such representations.
Since different similarity coefficients quantify different types of structural resemblance, several built-in similarity measures are available in OEGraphSim (see Table: Built-in similarity indices) The table below defines the four basic bit count terms that are used in fingerprint-based similarity calculations:
| Symbol | Description |
|---|---|
![]() |
number of bits set on in fingerprint A but not in B |
![]() |
number of bits set on in fingerprint B but not in A |
![]() |
number of bits set on in both fingerprints |
![]() |
number of bits set off in both fingerprints |
| Similarity measure | Range | OEGraphSim Function |
|---|---|---|
| Cosine | ![]() |
OECosine |
| Dice | ![]() |
OEDice |
| Euclidean | ![]() |
OEEuclid |
| Manhattan | ![]() |
OEManhattan |
| Tanimoto | ![]() |
OETanimoto |
| Tversky | variable | OETversky |
Formula: 
Calculates the ratio of the bits in common to the geometric mean of the number of on bits in the two fingerprints.
Formula: 
Calculates the ratio of the bits in common to the arithmetic mean of the number of on bits in the two fingerprints.
Formula: 
Formula: 
Formula: 
The number of bits set in both molecules divided by the number of bits
set in either molecules.
The more sparsely bits are set on, the smaller
values generally become.
The following example demonstrates how to calculate Tanimoto scores from fingerprints.
Example molecules
Listing 1: Calculating Tanimoto index
#!/usr/bin/env python
from openeye.oechem import *
from openeye.oegraphsim import *
molA = OEGraphMol()
OEParseSmiles(molA, "c1ccc2c(c1)c(c(oc2=O)OCCSC(=N)N)Cl")
fpA = OEFingerPrint()
OEMakeFP(fpA, molA, OEFPType_MACCS166)
molB = OEGraphMol()
OEParseSmiles(molB, "COc1cc2ccc(cc2c(=O)o1)NC(=N)N")
fpB = OEFingerPrint()
OEMakeFP(fpB, molB, OEFPType_MACCS166)
molC = OEGraphMol()
OEParseSmiles(molC, "COc1c(c2ccc(cc2c(=O)o1)NC(=N)N)Cl")
fpC = OEFingerPrint()
OEMakeFP(fpC, molC, OEFPType_MACCS166)
print "Tanimoto(A,B) = %.3f" % OETanimoto(fpA, fpB)
print "Tanimoto(A,C) = %.3f" % OETanimoto(fpA, fpC)
print "Tanimoto(B,C) = %.3f" % OETanimoto(fpB, fpC)
Molecules B and C (shown in Figure: Example Molecules) have the largest Tanimoto value since they share the largest number of common structural features.
The output of Listing 1 is the following:
Tanimoto(A,B) = 0.618
Tanimoto(A,C) = 0.709
Tanimoto(B,C) = 0.889
The following code snippet demonstrates how implement the Yule similarity measure with the following formula:

def CalculateYule(fpA, fpB):
onlyA, onlyB, bothAB, neitherAB = OEGetBitCounts(fpA, fpB)
yule = float(bothAB * neitherAB - onlyA * onlyB)
yule /= float(bothAB * neitherAB + onlyA * onlyB)
return yule
The OEGetBitCounts function returns the four basic
values (namely
,
,
and
) from which any similarity measures can be calculated.
For the definition of these values see
Table: Basic terms
OEMakeFP(fpA, molA, OEFPType_Path)
OEMakeFP(fpB, molB, OEFPType_Path)
OEMakeFP(fpC, molC, OEFPType_Path)
print "Yule(A,B) = %.3f" % CalculateYule(fpA, fpB)
print "Yule(A,C) = %.3f" % CalculateYule(fpA, fpC)
print "Yule(B,C) = %.3f" % CalculateYule(fpB, fpC)
Warning
User-defined similarity measures can only be used with path (OEFPType_Path) and MACCS key (OEFPType_MACCS166) fingerprints but not with LINGO (OEFPType_Lingo).