Section Substructure Search describes how to perform substructure search initialized with a SMILES or SMARTS string. OEChem also provides the ability to interpret and utilize query structures expressed in the MDL query file format (see MDL query example in Figure: Query). Listing 1 shows how to initialize an OESubSearch object from a MDL query file and perform a substructure search.
Query
Listing 1: Example of substructure search using MDL query file
#!/usr/bin/env python
from openeye.oechem import *
qfile = oemolistream("query.mol")
tfile = oemolistream("targets.sdf")
# set the same aromaticity model for the query and the target file
aromodel = OEIFlavor_Generic_OEAroModelMDL
qflavor = qfile.GetFlavor(qfile.GetFormat())
qfile.SetFlavor(qfile.GetFormat(),(qflavor|aromodel))
tflavor = tfile.GetFlavor(tfile.GetFormat())
tfile.SetFlavor(tfile.GetFormat(),(tflavor|aromodel))
# read MDL query and initialize the substructure search
opts = OEMDLQueryOpts_Default|OEMDLQueryOpts_SuppressExplicitH
qmol = OEQMol()
OEReadMDLQueryFile(qfile,qmol,opts)
ss = OESubSearch(qmol)
# loop over target structures
tindex = 1
tmol = OEGraphMol()
while OEReadMolecule(tfile,tmol):
if ss.SingleMatch(tmol):
print "hit target= ",tindex,OECreateCanSmiString(tmol)
tindex += 1
After opening the MDL query and the target files, the model used to assign aromaticity to the imported structures can be adjusted.
aromodel = OEIFlavor_Generic_OEAroModelMDL
qflavor = qfile.GetFlavor(qfile.GetFormat())
qfile.SetFlavor(qfile.GetFormat(),(qflavor|aromodel))
tflavor = tfile.GetFlavor(tfile.GetFormat())
tfile.SetFlavor(tfile.GetFormat(),(tflavor|aromodel))
If the aromaticity model is not specified for the input files, then the OpenEye aromaticity model is used by default. For more information about the various aromaticity models of OEChem see Aromaticity Perception.
OEReadMDLQueryFile(qfile,qmol,opts)
ss = OESubSearch(qmol)
In general, the aromaticity model chosen should be consistent between the query and target molecules to be searched. Using different aromaticity models may produce false negatives as aromatic systems may be treated differently. Section Aromaticity further explains the effects of using various aromaticity models when performing a substructure search.
OEReadMDLQueryFile function reads the MDL query directly into a OEQMolBase object , which then can be used to initialize an OESubSearch instance.
The MDL query structure can also be read into a OEMolBase object (see code snippet below). In this case, the OEReadMDLQueryFile function attaches the query features present in the input MDL file to the related atoms and bonds of the OEMolBase object. The OEQMolBase object can be subsequently created by calling the OEBuildMDLQueryExpressions function.
mol = OEGraphMol()
OEReadMDLQueryFile(qfile,mol)
# mol can be manipulated here
qmol = OEQMol()
# build OEQMol with OEMDLQueryOpts_Default option
OEBuildMDLQueryExpressions(qmol,mol)
ss = OESubSearch(qmol)
The declaration of these functions are:
OEReadMDLQueryFile(ifs,mol)
# ifs-oemolistream, mol-OEMolBase
# returns true or false
OEReadMDLQueryFile(ifs,qmol,opts)
# ifs-oemolistream, mol-OEQMolBase, opts-integer with OEMDLQueryOpts_Default
# returns true or false
OEBuildMDLQueryExpressions(qmol,mol,opts)
# ifs-oemolistream, mol-OEQMolBase, opts-integer with OEMDLQueryOpts_Default
# returns true or false
The opts parameter defines how the MDL query is interpreted when an OEQMolBase object is constructed. The following options are present in the OEMDLQueryOpts namespace:
Only constraints explicitly specified in the MDL file are added to the OEChem.OEQMolBase query structure. See section Supported MDL Query Features for the supported MDL query features.
OEMDLQueryOpts_SuppressExplicitH
This option controls how the explicit hydrogens of the query are matched to the explicit/implicit hydrogens of the target structures. For more information see Explicit Hydrogens.
OEMDLQueryOpts_AddBondAliphaticConstraint
If this option is specified, then an aliphatic query bond can only be mapped to the aliphatic bonds in the target structure. Figure: Interpretation A shows how the MDL query structure is interpreted when the OEMDLQueryOpts_AddBondAliphaticConstraint option is used.
Query ‘A’ will match all three of the target compounds displayed in Figure: Interpretation A. If the OpenEye model is used to perceive aromatic rings, then query ‘B’ substructure is present only in target ‘2’. If the MDL aromaticity model is used, then target ‘3’ is also a hit, since five-membered heterocycles are not considered aromatic in this model. For more information about different aromaticity models and their effects on substructure searching see section Aromaticity.
Interpretation A
OEMDLQueryOpts_AddBondTopologyConstraint
By default, a bond that is part of any ring system in the query structure can only be mapped to ring bonds in the target structure. If the OEMDLQueryOpts_AddBondTopologyConstraint option is specified, constraints are also added to the chain bonds of the query in order to map them to only chain bonds in the target.
Figure: Interpretation B shows how the MDL query structure is interpreted when the OEMDLQueryOpts_AddBondTopologyConstraint option is used. Query ‘A’ will match all three of the target compounds displayed in Figure: Interpretation B, while query ‘B’ is present only in target ‘3’.
If the ‘ring’ property is specified in the MDL query file for a particular chain bond in the query structure, then its topology constraint is overridden.
Interpretation B
Supported atom query features:
Supported bond query features:
Query
Figure: Query shows an MDL query structure example with several different query features. For more details on atom and bond query features, please refer to the MDL CTFile Formats document (http://www.mdli.com/downloads/public/ctfile/ctfile.jsp).
Perceiving aromaticity in the query and the target structures is important in order to insure that the result of a substructure search is independent of the different Kekulé representations of the participating structures. A query bond which is part of an aromatic ring system can be mapped to any aromatic bonds in the target. Figure: Aromaticity Match shows an example where both Kekulé representations of the benzene-1,2-diol substructure are present in the two target structures.
Aromaticity Match
Altering the aromaticity model will affect the results of a substructure search. Figure: Aromaticity A – Figure: Aromaticity C show several examples where different results were obtained by applying the MDL and the OpenEye aromaticity models. This is a consequence of the fact that in the MDL aromaticity model, five-membered heterocycles are not considered aromatic.
Note
It is highly recommended to apply the same aromaticity model to the query and the target structures.
Listing 1 shows an example of how to change the aromaticity flavor of input files.
Aromaticity A
Aromaticity B
Aromaticity C
If a ring in the query structure contains generic atom(s) (see example in Figure: Generic atom example A), then the aromaticity of the ring can not be perceived. In order to maintain the independence from the Kekulé representation, 6-membered rings with alternating single/double bonds are assumed to be aromatic.
Generic atom example A
Similarly, a 5-membered ring with generic atom(s) is considered aromatic if it is composed of two single and two double bonds. See Example in Figure: Generic atom example B.
Generic atom example B
During the substructure search, each query atom has to be mapped to a target atom in order to detect subgraph isomorphism. Therefore, a problem can arise if the query structure contains explicit hydrogens or an atom list with hydrogens (see example in Figure: Query), but the target structure has implicit hydrogens Figure: Targets)
Query
Listing 2: Example of substructure search with accessing atom mapping
#!/usr/bin/env python
from openeye.oechem import *
qfile = oemolistream("query.mol")
tfile = oemolistream("targets.sdf")
# read MDL query and initialize the substructure search
qmol = OEQMol()
OEReadMDLQueryFile(qfile,qmol)
ss = OESubSearch(qmol)
# loop over target structures
tindex = 1
tmol = OEGraphMol()
while(OEReadMolecule(tfile,tmol)):
OEAddExplicitHydrogens(tmol)
for mi in ss.Match(tmol,True):
print "hit target= ",tindex,
for ai in mi.GetTargetAtoms():
print ai.GetIdx(),OEGetAtomicSymbol(ai.GetAtomicNum()),
print
tindex += 1
This problem can be solved in two ways:
In both cases, the query presented in Figure: Query will match only target ‘C’ and ‘D’ shown in Figure: Targets. Figure: Atom mapping shows the three detected substructures in target ‘C’ when adding explicit hydrogens to the target structures.
The execution of the substructure search is significantly faster if the hydrogens are suppressed in the target structures, since the search space can be an order of magnitude smaller.
Targets
Atom mapping