Unfortunately, there are a number of ambiguities in the original paper describing the Daylight SMILES syntax, that have led to different SMILES being accepted or rejected by independent SMILES parser implementations.
| SMILES | Daylight 4.41 | Corina 1.6 | Corina WWW | Concord 3.2.1 | COBRA 3.21A | Synopsis 4.0 | OEChem 1.5 |
|---|---|---|---|---|---|---|---|
| C1.C1 | Y | Y | Y | N | N | Y | Y |
| C%00CC%00 | Y | Y | Y | N | N | N | Y |
| C(C.C)C | Y | Y | Y | N | N | Y | Y |
| C(C)1CC1 | Y | N | N | N | Y | N | Y |
| C(.C) | Y | Y | Y | N | N | Y | Y |
| C() | Y | Y | N | Y | Y | Y | Y |
| (CO)=O | N | N | N | N | N | Y | N |
| (C) | N | N | N | N | N | Y | Y |
| .C | N | N | N | Y | Y | N | Y |
| C..C | N | Y | N | Y | Y | N | Y |
| C. | N | Y | Y | Y | Y | Y | Y |
| C=(O)C | N | Y | N | N | Y | N | N |
| C((C)) | N | Y | N | Y | N | Y | Y |
| C.(C) | N | Y | N | Y | N | N | Y |
| C1CC(=1) | N | Y | N | N | N | N | Y |
| C1CC(1) | N | N | N | N | N | N | Y |
| C(C.) | N | Y | N | N | N | N | Y |
| C==C | N | Y | N | N | N | N | Y |
| C(1CC1) | N | N | N | N | N | N | Y |
| C(1)CC1 | N | N | N | N | N | N | Y |
The OEChem SMILES parser actually has two modes. The default is relaxed which produces the results above and enables the SMILES extensions described in the next section. It also has a `strict` mode that may be used for validating SMILES strings that is far less forgiving about dubious SMILES strings.
The OEChem SMILES parsers support several minor extensions to Daylight syntax. Each of these extensions and its motivations are listed below.
In addition to the standard Daylight unquoted elements, B, C, N, O, F, P, S, Cl, Br and I, OEChem‘s SMILES readers also allow H, D and T to specify hydrogen, deuterium and tritium. Additionally, to support Syracuse SMILES, ‘CL’ and ‘BR’ are considered ‘Cl’ and ‘Br’. The periodic table is also extended from 102 to 109 elements, i.e. [Sg] for Seaborgium, with the addition of [D] and [T] representing [2H] and [3H] respectively.
OEChem may support Na, Li, and K as unquoted elements to support Syracuse SMILES at some point in the future.
OEChem SMILES also allows supports external closures, or potentially unsatisfied ring closures. These have the syntax, ampersand followed by a ring closure specification, i.e. an optional bond order followed by either a digit or a % character and two digits. The index space of external bonds and ring closures is separate, so that the ring closure 2 and the external attachment point &2 are unrelated.
When external attachment points are paired within a SMILES string, they behave identically to ring closures, just using a separate index space. Hence, the SMILES c&1ccccc&1 is interpreted the same way as c1ccccc1, and C&1.C&1 is interpreted like C1.C1, i.e the SMILES CC.
However, unlike ring closures, unpaired external attachment points are allowed and are interpreted like RGroup attachment points above. Hence, the SMILES CC&1 (on its own) is equivalent to the RGroup attachment SMILES CC[R1], which is equivalent to the atom mapped molecule CC[*:1].
The major advantage of these semantics, inspired by Daylight’s CHUCKLES, is that it allows convenient enumeration of combinatorial libraries using string concatenation. For example, three components of a library may be specified as C&1CCC&2, F&1 and Br&2. The using the same notation C&1CCC&2.F&1.Br&2 is interpreted as the reaction product, i.e. FCCCCBr. .
As with ring closures, bond orders may be specified after the ampersand and before the closure index, C&=1, and two digit closures are indicated by a % prefix, i.e. C&%12 or C&=%12.
See also