19.1 Substructure Search

Substructure searches can be done in OEChem using the OESubSearch class. The OESubSearch class can be initialized with a SMARTS pattern, an OEQMolBase query molecule, or a molecule with expression options. The following example demonstrates how to initialize a OESubSearch instance with a SMARTS pattern, and perform a substructure search.

 1 import openeye.oechem.*;
 2
 3 public class SubstructureSearch
 4 {
 5   public static void main(String argv[])
 6   {
 7     OEGraphMol target = new OEGraphMol();
 8     oechem.OEParseSmiles(target, "c1ccccc1C");
 9     /* create a substructure search object */
10     OESubSearch ss = new OESubSearch("c1ccccc1");
11     if (ss.SingleMatch(target))
12       System.out.println("benzene matches toluene");
13     else
14       System.out.println("benzene does not match toluene");
15   }
16 }

Listing:19.1 Substructure search example

In the Listing 19.1, the query pattern is benzene and the molecule in which the substructure is being searched for is toluene. Since benzene is a substructure of toluene the OESubSearch.SingleMatch method will return true. The OESubSearch.SingleMatch method returns true if a single subgraph isomorphism is detected in the molecule passed as the function argument.

The OESubSearch class is able to identify the atom and bond correspondences of the pattern and target structures. The program in Listing 19.2 extends the simple match example to write out all atom correspondences between benzene and toluene.

 1 import openeye.oechem.*;
 2
 3 public class AtomMap
 4 {
 5   public static void main(String argv[])
 6   {
 7     OEGraphMol mol = new OEGraphMol();
 8     oechem.OEParseSmiles(mol, "c1ccccc1C");
 9     /* create a substructure search object */
10     OESubSearch ss = new OESubSearch("c1ccccc1");
11
12     int count = 1;
13     /* loop over matches */
14     for (OEMatchBaseIter match = ss.Match(mol); match.hasNext();)
15     {
16       System.out.print("Match "+count+ " : ");
17       System.out.print("pattern atoms: ");
18       OEMatchPairAtomIter ma = match.next().GetAtoms();
19       for( ;ma.hasNext(); )
20         System.out.print(ma.next().getPattern().GetIdx()+" ");
21       ma.ToFirst();
22       System.out.print("target atoms: ");
23       for( ;ma.hasNext(); )
24         System.out.print(ma.next().getTarget().GetIdx()+" ");
25       System.out.println();
26       count++;
27     }
28   }
29 }

Listing:19.2 Atom map example

The output of Listing 19.2 is the following:

Match 1 : pattern atoms: 0 1 2 3 4 5 target atoms:  0 1 2 3 4 5
Match 2 : pattern atoms: 0 1 2 3 4 5 target atoms:  0 5 4 3 2 1
Match 3 : pattern atoms: 0 1 2 3 4 5 target atoms:  1 2 3 4 5 0
Match 4 : pattern atoms: 0 1 2 3 4 5 target atoms:  1 0 5 4 3 2
Match 5 : pattern atoms: 0 1 2 3 4 5 target atoms:  2 3 4 5 0 1
Match 6 : pattern atoms: 0 1 2 3 4 5 target atoms:  2 1 0 5 4 3
Match 7 : pattern atoms: 0 1 2 3 4 5 target atoms:  3 4 5 0 1 2
Match 8 : pattern atoms: 0 1 2 3 4 5 target atoms:  3 2 1 0 5 4
Match 9 : pattern atoms: 0 1 2 3 4 5 target atoms:  4 5 0 1 2 3
Match 10 : pattern atoms: 0 1 2 3 4 5 target atoms:  4 3 2 1 0 5
Match 11 : pattern atoms: 0 1 2 3 4 5 target atoms:  5 0 1 2 3 4
Match 12 : pattern atoms: 0 1 2 3 4 5 target atoms:  5 4 3 2 1 0

The OESubSearch.Match method performs subgraph isomorphism determination for instances of OEMolBase or OEQMolBase and returns an iterator over all detected subgraphs. Each of the subgraphs can be queried for their atom and bond correspondences. In this particular example, the benzene substructure is identified twelve times in toluene. There are twelve matches because the benzene ring can be rotated around for 6 matches, and then flipped and rotated around for another 6 matches, yielding a total of twelve. Each of the matches differ in their atom and bond correspondences to the pattern substructure.

A match or subgraph is considered unique if it differs from all other subgraphs found previously by at least one atom or bond. When doing unique matching, two subgraph matches which cover the same atoms and bonds, albeit in different orders, will be called duplicates and it will be discarded. In order to retrieve only unique matches, the Match function has to be called with a second argument being set to true. In the Listing 19.2 example, using unique search would yield only a single match for benzene in toluene.

An OESubSearch may be initialized using a SMARTS or an query molecule (OEQMolBase) . Query molecules must have atom and bond expressions built for the entire molecule to be able to initialize the search object (see OEQMolBase.BuildExpressions in the API document).

OESubSearch.GetPattern returns a read-only reference to the query molecule contained in an instance of OESubSearch . Const OEQMolBase methods can be used to interrogate the returned OEQMolBase reference.

The OESubSearch.SetMaxMatches method sets the maximum number of subgraphs to be returned by the OESubSearch.Match methods. Once the maximum number of subgraphs has been found the search for is terminated. By default, an OESubSearch is constructed with the maximum number of matches set to 1024. The constraint on the maximum number of matches can be removed by calling SetMaxMatches with a value of zero.