OpenEye Scientific Software  
home | contact |
about us science products business support
  news events
  
 home > support > misc > smirksprimer    printer friendly

SMIRKS Primer

An introduction to SMIRKS and OEChem for reaction based chemoinformatics algorithms.


History

The SMIRKS language was invented by David Weininger and Jack Delany of Daylight Chemical Information Systems, and derived from the SMILES and SMARTS languages for molecules and subgraph pattern matching, respectively. SMIRKS is largely consistent in syntax and semantics with SMILES and SMARTS but there are key differences. A SMIRKS represents a "reaction transform", which, given a set of reactants, may match and result in a set of products. The SMIRKS may represent the mechanism of the reaction, but it may not. SMIRKS may represent general molecular graph modifications which may have no relation to real chemistry. Atoms may be created or destroyed or change their atomic number.


Uses of SMIRKS

  1. Generating virtual combinatorial libraries from fragments
  2. Fragmentation
  3. Standardization/correction of valence models
  4. Evolutionary algorithms
  5. Representing reaction types for expert systems (e.g. retrosynthetic analysis)
  6. Generic molecular manipulation
  7. Rigorous, reusable encoding of algorithmic intelligence


Why SMIRKS?

Like SMILES and SMARTS, SMIRKS is a compact line notation with rigorous, documented and well understood syntax and semantics. Software supporting these languages accurately provide added benefits to users through interoperability and use of linguistic and conceptual chemoinformatics standards (inter-comprehensibility).


Basic syntax

A SMIRKS is a set of one or more dot disconnected reactants, followed by ">>", followed by a set of one or more dot disconnected products. Reactants and products are expressed in a format based very closely on smiles and smarts, and comprised only from elements of these languages.. Several points can be illustrated by the following example:

[C&X4&H3:1][H].[O&X2&H1:2][H]>>[C:1]-[O:2]
atom mapping:
Atom mapping specifies correspondence between reactant and product atoms. Atom map indices are integers following the colon in an atom specification.
unmapped atoms:
In this example the two "[H]" hydrogens are unmapped. Unmapped reactant atoms are destroyed by the reaction. Unmapped product atoms are created by the reaction.
reactant smarts:
The reactant "C&X4&H3" specification is regarded as a smarts which must match a carbon atom with exactly four attached atoms of which three are hydrogens.
hydrogens as atoms and properties:
Hydrogens are treated both as atoms and properties of heavy atoms -- the H-count. So the explicit methyl hydrogen atom "[H]" is not in addition to the three "H3" hydrogens.
no need to match products:
In this case the reaction bonds two atoms and destroys two atoms. There is no need to use smarts to match the products.

OpenEye extensions to SMIRKS

Note that OEChem allows some smirks extensions beyond the Daylight standard. Mostly these correspond with the strictSmirks flag used when initializing OEUniMolecularRxn and OELibraryGen objects, as described in the API manual entries for these classes.

Use of the Daylight standard may be preferable in many cases. One reason is compatibility, since users may wish to use both software packages, or be able to in future -- and/or other smirks-able software. Another reason is pedagogical, since smirks is somewhat tricky to learn without added complications, and it may be easier to learn about the extensions after some experience may indicate their justification. I'll alternately refer to the Daylight smirks standard as "standard smirks".


When should I use explicit hydrogens in SMIRKS?

If hydrogens are removed from a heavy atom, those hydrogens should be specified explicitly in the reactant. These hydrogens may or may not be mapped to explicit hydrogens in the product(s). If they are not present in the product, they have been deleted by the transform.


Where can SMARTS be used in SMIRKS?

SMARTS atom specifications can be used for mapped atoms only. SMARTS are only relevant on the reactant side for a forward transform using standard smirks.

Note that this is not true for reverse transforms, which are implemented by Daylight but not OEChem. Also this is not true when OEChem is used to modify implicit H count using the "h" smarts primitive, this functionality being an OE smirks extension.


How can I preserve or change atom properties at a mapped atom?

Properties of mapped atoms are preserved by default with OEChem's implementation of SMIRKS. (This is a difference between OpenEye and Daylight.) Properties = {charge, stereo, mass}. To change properties they must be specified in the products. Note that implicit hydrogen count is also an atom property in most contexts. However, with standard smirks, hydrogens are handled as explicit atoms.


Neutralizing an atom

To modify the charge to zero, "+0" or "-0" should be specified explicitly in the products as follows:

[*:1][N+:2](=[O:3])[O-:4]>>[*:1][N+0:2](=[O:3])=[O+0:4]

OELibraryGen vs. OEUniMolecularRxn

OELibraryGen and. OEUniMolecularRxn are two OEChem classes which implement smirks within the overall OEChem molecular object model. As implied by the name, OEUniMolecularRxn handles only single reactant smirks and reactions.

With OELibraryGen, the explicit-H property (Get/SetExplicitHydrogens(bool)) is by default set to true, which is consistent with standard smirks behavior. OEUniMolecularRxn lacks such a setting or method, so this can only be implemented by externally setting H's explicit or implicit.

OELibraryGen applies smirks transforms one time when the GetProducts() method is called. In contrast, OEUniMolecularRxn applies transforms exhaustively, that is, the transform is applied to products iteratively until no match is found.


OELibraryGen, reactant numbers and count

An OELibraryGen instance is initialized with a smirks, and reactants are specified using the SetStartingMaterial() method. Unlike Daylight's reaction toolkit, OELibraryGen requires that reactants are specified according to their number, which must correspond with their lexical order in the input smirks. And, there must be exactly the correct number of reactants specified. If a user wishes to apply a smirks to a "soup" of reactants, only some of which may be involved, the combinatorics must be coded separately from OELibraryGen.


Representing individual, specific reactions

A subtle difference between OpenEye and Daylight reaction transforms is: the output of Daylight reaction transforms are normally reactions; in contrast, the output of OE reaction transforms are normally products. Daylight reactions are represented as reaction smiles, optionally with atom maps to designate a partial or complete reaction mechanism. Although the OELibraryGen and OEUniMolecularRxn methods do not directly output such reactions, these reaction smiles can easily be constructed. The OELibraryGen::SetAssignMapIdx() method is used to maintain the map indices.


Discarding unwanted byproducts

A fundamental limitation of a smirks is that atoms can only be destroyed if they are matched on the reactant side. That is, we must know they exist when writing the smirks. This makes it impossible to implement with smirks alone a concept akin to "discard whatever is attached here". Instead, such byproducts, a.k.a. "leaving groups", must be discarded in a separate step after the smirks based transform.

As an example, let's say our task is to replace an alkyl group attached to an aromatic ring by a single hydrogen atom. The following smirks could accomplish this but also leave the alkyl group:

[a:1]-[C:2]>>[a:1][H].[H][C:2]
The following smirks illustrates a helpful device:
[a:1]-[C:2]>>[a:1][H].[Xe][C:2]
The Xeon atom serves to tag the byproduct so a subsequent step can easily recognize unwanted byproducts as such and discard them.


See also:


rev: 2007 05 30

<Back to top>

© 1997-2008 OpenEye Scientific Software
home | contact |