Subsections

 
3.2 Generating a SMILES from a Molecule

To produce a SMILES string from a molecule, we use a function. The next two examples will use OECreateCanSmiString. OECreateCanSmiString converts the given OEMolBase into a canonical SMILES string and returns that string. Note the difference in the syntax between Java and C++. C++ sends an empty string as an argument, whereas in Java the SMILES string is the return value of the function.

import openeye.oechem.*;

//create a new molecule
OEGraphMol mol = new OEGraphMol();

// convert the string into a molecule
if (oechem.OEParseSmiles(mol, "c1ccccc1")) {
  String newsmi = oechem.OECreateCanSmiString(mol);
  System.out.println("Canonical SMILES string is "+newsmi);
}
else {
  System.err.println("Problem parsing the SMILES");
}

The following more complicated example reads SMILES from stdin and writes the canonical SMILES to stdout.

 1 /**************************************************************
 2  * Copyright 2005 OpenEye Scientific Software, Inc.
 3  *************************************************************/
 4
 5 import java.io.*;
 6 import openeye.oechem.*;
 7
 8 public class SimpleCanSmi {
 9   public static void main(String argv[]) {
10     OEGraphMol mol = new OEGraphMol();
11     String str;
12     try {
13       InputStreamReader isr = new InputStreamReader(System.in);
14       BufferedReader br = new BufferedReader(isr);
15
16       while ( (str = br.readLine()) != null ) {
17         mol.Clear();
18         if (oechem.OEParseSmiles(mol, str)) {
19           String cansmi = oechem.OECreateCanSmiString(mol);
20           System.err.println("Canonical SMILES is " + cansmi);
21         }
22         else {
23           System.err.println("SMILES string was invalid");
24         }
25       }
26     }
27     catch (Exception e) {
28       System.err.println(e);
29     }
30   }
31 }

Listing:3.1 Converting SMILES to canonical SMILES

Notice that this example makes use of the OEMolBase Clear method to reuse the molecule. The behavior of OEParseSmiles is to add the given SMILES to the current molecule. If the line mol.Clear() was removed from the program, the output would contain longer and longer SMILES containing disconnected fragments.

The above example is a very simple canonical SMILES creation program, but probably doesn't do what most users might expect. The molecule returned by OEParseSmiles preserves the aromaticity present in the input SMILES string, so for example, if benzene is expressed as "c1ccccc1" all atoms and bonds are marked as aromatic, but if expressed as a Kekulé form, "C1=CC=CC=C1", all atoms and bonds are kept aliphatic.

Input Output
cc c=c
C1=CC=CC=C1 C1=CC=CC=C1
C1=CN=CC=C1 C1=CC=NC=C1

A common task after creating a molecule from SMILES is to normalize its aromaticity with OEAssignAromaticFlags. So the following example will produce canonical SMILES including perception of aromaticity from the connection table.

 1 /**************************************************************
 2  * Copyright 2005 OpenEye Scientific Software, Inc.
 3  *************************************************************/
 4
 5 import java.io.*;
 6 import openeye.oechem.*;
 7
 8 public class BetterCanSmi {
 9   public static void main(String argv[]) {
10     OEGraphMol mol = new OEGraphMol();
11     String str;
12     try {
13       InputStreamReader isr = new InputStreamReader(System.in);
14       BufferedReader br = new BufferedReader(isr);
15
16       while ( (str = br.readLine()) != null ) {
17         mol.Clear();
18         if (oechem.OEParseSmiles(mol, str)) {
19           oechem.OEAssignAromaticFlags(mol);
20           String cansmi = oechem.OECreateCanSmiString(mol);
21           System.err.println("Canonical SMILES is " + cansmi);
22         }
23         else {
24           System.err.println("SMILES string was invalid");
25         }
26       }
27     }
28     catch (Exception e) {
29       System.err.println(e);
30     }
31   }
32 }

Listing:3.2 A better SMILES to canonical SMILES converter

And here are the results of this new version:

Input Output
cc C=C
C1=CC=CC=C1 c1ccccc1
C1=CN=CC=C1 c1ccncc1

3.2.1 OECreateCanSmiString

Note that the canonical SMILES generated by this function remains dependent on the state of the molecule, esp. its aromaticity state. Thus, to generate a canonical smiles suitable for purposes such as a database key, the programmer must assure that the state of the molecule has been standardized. In particular, aromaticity should be perceived according to the preferred model. The SMILES canonicalization flag OESMILESFlag.Canonical refers specifically to canonical ordering of atoms. In contrast, the high-level output function OEWriteMolecule, when writing the canonical SMILES format (OEFormat.CAN) does invoke OEFindRingAtomsAndBonds OEAssignAromaticFlags.