Substructure searches can be done in OEChem using the
OESubSearch class. The OESubSearch class can be
initialized with a SMARTS pattern, an OEQMolBase query molecule,
or a molecule with expression options. The following example
demonstrates how to initialize a OESubSearch instance with a
SMARTS pattern, and perform a substructure search.
1 #!/usr/bin/env python
2
3 from openeye.oechem import *
4 import os,sys
5
6 mol = OEGraphMol()
7 OEParseSmiles(mol, "c1ccccc1C")
8 # create a substructure search object
9 ss = OESubSearch("c1ccccc1")
10
11 if ss.SingleMatch(mol):
12 print "benzene matches toluene"
13 else:
14 print "benzene does not match toluene"
In the Listing 18.1, the query pattern is benzene and the molecule
in which the substructure is being searched for is toluene.
Since benzene is a substructure of toluene the
OESubSearch.SingleMatch method will
return true.
The OESubSearch.SingleMatch method returns true if a single subgraph
isomorphism is detected in the molecule passed as the function argument.
The OESubSearch class is able to identify the atom and bond
correspondences of the pattern and target structures. The program in
Listing 18.2 extends the simple match example to write out
all atom correspondences between benzene and toluene.
1 #!/usr/bin/env python
2
3 from openeye.oechem import *
4 import os,sys
5
6 mol= OEGraphMol()
7 OEParseSmiles(mol, "c1ccccc1C")
8 # create a substructure search object
9 ss = OESubSearch("c1ccccc1")
10
11 count = 1
12 # loop over matches
13 for match in ss.Match(mol):
14 sys.stdout.write("\nMatch %d : " % count)
15 sys.stdout.write("pattern atoms: ")
16 for ma in match.GetAtoms():
17 sys.stdout.write("%d " % ma.pattern.GetIdx())
18 sys.stdout.write("target atoms: ")
19 for ma in match.GetAtoms():
20 sys.stdout.write("%d " % ma.target.GetIdx())
21 count+=1
The output of Listing 18.2 is the following:
Match 1 : pattern atoms: 0 1 2 3 4 5 target atoms: 0 1 2 3 4 5 Match 2 : pattern atoms: 0 1 2 3 4 5 target atoms: 0 5 4 3 2 1 Match 3 : pattern atoms: 0 1 2 3 4 5 target atoms: 1 2 3 4 5 0 Match 4 : pattern atoms: 0 1 2 3 4 5 target atoms: 1 0 5 4 3 2 Match 5 : pattern atoms: 0 1 2 3 4 5 target atoms: 2 3 4 5 0 1 Match 6 : pattern atoms: 0 1 2 3 4 5 target atoms: 2 1 0 5 4 3 Match 7 : pattern atoms: 0 1 2 3 4 5 target atoms: 3 4 5 0 1 2 Match 8 : pattern atoms: 0 1 2 3 4 5 target atoms: 3 2 1 0 5 4 Match 9 : pattern atoms: 0 1 2 3 4 5 target atoms: 4 5 0 1 2 3 Match 10 : pattern atoms: 0 1 2 3 4 5 target atoms: 4 3 2 1 0 5 Match 11 : pattern atoms: 0 1 2 3 4 5 target atoms: 5 0 1 2 3 4 Match 12 : pattern atoms: 0 1 2 3 4 5 target atoms: 5 4 3 2 1 0
The OESubSearch.Match method performs
subgraph isomorphism determination for instances of OEMolBase or
OEQMolBase and returns an iterator over all detected
subgraphs.
Each of the subgraphs can be queried for their atom and bond correspondences.
In this particular example, the benzene substructure is identified
twelve times in toluene.
There are twelve matches because the benzene ring can be rotated around for 6 matches,
and then flipped and rotated around for another 6 matches, yielding a total of twelve.
Each of the matches differ in their atom and bond correspondences to the
pattern substructure.
A match or subgraph is considered unique if it differs from all other subgraphs found
previously by at least one atom or bond.
When doing unique matching, two subgraph matches which cover
the same atoms and bonds, albeit in different orders, will be called duplicates
and it will be discarded.
In order to retrieve only unique matches, the Match function has to be called
with a second argument being set to true.
In the Listing 18.2 example, using unique search would yield only a single
match for benzene in toluene.
An OESubSearch may be initialized using a SMARTS or an
query molecule (OEQMolBase) . Query molecules must have
atom and bond expressions built for the entire molecule to be able to initialize the search object
(see OEQMolBase.BuildExpressions in the API document).
OESubSearch.GetPattern returns a read-only reference to the
query molecule contained in an instance of OESubSearch . Const
OEQMolBase methods can be used to interrogate the returned
OEQMolBase reference.
The OESubSearch.SetMaxMatches method sets the maximum number of subgraphs to
be returned by the OESubSearch.Match methods. Once the maximum
number of subgraphs has been found the search for is terminated.
By default, an OESubSearch is constructed with the maximum number of
matches set to 1024.
The constraint on the maximum number of matches can be removed by calling
SetMaxMatches with a value of zero.