To produce a SMILES string from a molecule, we use a function. The next two examples will use OECreateCanSmiString. OECreateCanSmiString converts the given OEMolBase into a canonical SMILES string and returns that string. Note the difference in the syntax between Java and C++. C++ sends an empty string as an argument, whereas in Java the SMILES string is the return value of the function.
import openeye.oechem.*;
//create a new molecule
OEGraphMol mol = new OEGraphMol();
// convert the string into a molecule
if (oechem.OEParseSmiles(mol, "c1ccccc1")) {
String newsmi = oechem.OECreateCanSmiString(mol);
System.out.println("Canonical SMILES string is "+newsmi);
}
else {
System.err.println("Problem parsing the SMILES");
}
The following more complicated example reads SMILES from stdin and writes the canonical SMILES to stdout.
1 /**************************************************************
2 * Copyright 2005 OpenEye Scientific Software, Inc.
3 *************************************************************/
4
5 import java.io.*;
6 import openeye.oechem.*;
7
8 public class SimpleCanSmi {
9 public static void main(String argv[]) {
10 OEGraphMol mol = new OEGraphMol();
11 String str;
12 try {
13 InputStreamReader isr = new InputStreamReader(System.in);
14 BufferedReader br = new BufferedReader(isr);
15
16 while ( (str = br.readLine()) != null ) {
17 mol.Clear();
18 if (oechem.OEParseSmiles(mol, str)) {
19 String cansmi = oechem.OECreateCanSmiString(mol);
20 System.err.println("Canonical SMILES is " + cansmi);
21 }
22 else {
23 System.err.println("SMILES string was invalid");
24 }
25 }
26 }
27 catch (Exception e) {
28 System.err.println(e);
29 }
30 }
31 }
Notice that this example makes use of the OEMolBase Clear method to
reuse the molecule. The behavior of OEParseSmiles is to add the
given SMILES to the current molecule. If the line mol.Clear() was
removed from the program, the output would contain longer and longer
SMILES containing disconnected fragments.
The above example is a very simple canonical SMILES creation program, but probably doesn't do what most users might expect. The molecule returned by OEParseSmiles preserves the aromaticity present in the input SMILES string, so for example, if benzene is expressed as "c1ccccc1" all atoms and bonds are marked as aromatic, but if expressed as a Kekulé form, "C1=CC=CC=C1", all atoms and bonds are kept aliphatic.
| Input | Output |
| cc | c=c |
| C1=CC=CC=C1 | C1=CC=CC=C1 |
| C1=CN=CC=C1 | C1=CC=NC=C1 |
A common task after creating a molecule from SMILES is to normalize its aromaticity with OEAssignAromaticFlags. So the following example will produce canonical SMILES including perception of aromaticity from the connection table.
1 /**************************************************************
2 * Copyright 2005 OpenEye Scientific Software, Inc.
3 *************************************************************/
4
5 import java.io.*;
6 import openeye.oechem.*;
7
8 public class BetterCanSmi {
9 public static void main(String argv[]) {
10 OEGraphMol mol = new OEGraphMol();
11 String str;
12 try {
13 InputStreamReader isr = new InputStreamReader(System.in);
14 BufferedReader br = new BufferedReader(isr);
15
16 while ( (str = br.readLine()) != null ) {
17 mol.Clear();
18 if (oechem.OEParseSmiles(mol, str)) {
19 oechem.OEAssignAromaticFlags(mol);
20 String cansmi = oechem.OECreateCanSmiString(mol);
21 System.err.println("Canonical SMILES is " + cansmi);
22 }
23 else {
24 System.err.println("SMILES string was invalid");
25 }
26 }
27 }
28 catch (Exception e) {
29 System.err.println(e);
30 }
31 }
32 }
And here are the results of this new version:
| Input | Output |
| cc | C=C |
| C1=CC=CC=C1 | c1ccccc1 |
| C1=CN=CC=C1 | c1ccncc1 |
Note that the canonical SMILES generated by this function remains
dependent on the state of the molecule, esp. its aromaticity state.
Thus, to generate a canonical smiles suitable for purposes such as a
database key, the programmer must assure that the state of the
molecule has been standardized. In particular, aromaticity should be
perceived according to the preferred model. The SMILES
canonicalization flag OESMILESFlag.Canonical refers
specifically to canonical ordering of atoms. In contrast, the
high-level output function OEWriteMolecule, when writing the
canonical SMILES format (OEFormat.CAN) does invoke
OEFindRingAtomsAndBonds OEAssignAromaticFlags.