Molecules

The OEGraphMol is the object representing a molecule used in most example programs you will find in OEChem‘s example directories, or in the code examples of this manual. An OEGraphMol is a concrete class which can be declared and used for most molecular functions in OEChem. Much of an OEGraphMol‘s API is defined by the OEMolBase abstract base-class. An OEGraphMol can be passed to any function which takes an OEMolBase argument.

See also

An OEGraphMol contains atoms and bonds. Their access is discussed in chapter Atom and Bond Traversal.

Construction and Destruction

The example below represents the smallest possible Java OEChem program. This program creates an OEGraphMol called mol when run. When the program ends, Java automatically cleans up the molecule when there are no more references to it.

Create a molecule

import openeye.oechem.*;

public class CreateOEGraphMol {
  public static void main(String argv[]) {
    OEGraphMol mol = new OEGraphMol();
  }
}

Construction from SMILES

A common method of creating a molecule in OEChem is via the SMILES representation. SMILES notation is commonly used in chemical information systems, as it provides a convenient string representation of a molecule. An introduction to SMILES syntax is provided in chapter SMILES Line Notation. The following examples will use the SMILES c1ccccc1 which describes the molecule benzene. A molecule can be created from a SMILES string using the OEParseSmiles function.

Creating a molecules from a SMILES string (version 1)

import openeye.oechem.*;

public class CreateOEGraphMolFromSMILES {
  public static void main(String argv[]) {
    // create a new molecule
    OEGraphMol mol = new OEGraphMol();
    // convert the SMILES string into a molecule
    oechem.OEParseSmiles(mol, "c1ccccc1");
  }
}

The OEParseSmiles function returns a boolean value indicating whether the input string was a valid SMILES representation of a molecule. It is good programming practice to check the return value and report an error message if anything went wrong. The following example shows adding a check on the return status of OEParseSmiles and prints an error message if the string was not a valid SMILES representation of a molecule.

Creating a Molecules from a SMILES string (version 2)

import openeye.oechem.*;

public class CreateOEGraphMolFromSMILESCheck {
  public static void main(String argv[]) {
    OEGraphMol mol = new OEGraphMol();

    if (oechem.OEParseSmiles(mol, "c1ccccc1")) {
      // do something with the molecule
    }
    else 
      oechem.OEThrow.Warning("SMILES string was invalid!");
  }
}

The molecule returned by OEParseSmiles preserves the aromaticity present in the input SMILES string. For example, if benzene is expressed as c1ccccc1 all atoms and bonds are marked as aromatic. But if it is expressed as a Kekulé form, C1=CC=CC=C1, all atoms and bonds are kept aliphatic. A common task after creating a molecule from SMILES is to perceive its aromaticity with OEAssignAromaticFlags function. For more information about the aromaticity perception see chapter Aromaticity Perception.

Creating molecules from a SMILES string (version 3)

import openeye.oechem.*;

public class CreateOEGraphMolFromSMILESAromatic {
  public static void main(String argv[]) {
    OEGraphMol mol = new OEGraphMol();

    if (oechem.OEParseSmiles(mol, "c1ccccc1")) {
      oechem.OEAssignAromaticFlags(mol);
      // do something with the molecule
    }
    else 
      oechem.OEThrow.Warning("SMILES string was invalid!");
  }
}

See also

For further information about aromaticity models see chapter Aromaticity Perception.

Reuse

Consider the following code to parse two separate SMILES strings, benzene and phenol, and print the number of heavy atoms in each.

Reusing a molecule

import openeye.oechem.*;

public class ReuseMolecule {
  public static void main(String argv[]) {
    OEGraphMol mol = new OEGraphMol();

    oechem.OEParseSmiles(mol, "c1ccccc1");
    System.out.println("Number of benzene atoms: " + mol.NumAtoms());

    oechem.OEParseSmiles(mol, "c1ccccc1O");
    System.out.println("Number of phenol atoms: " + mol.NumAtoms());
  }
}

The output of the preceding program is the following:

Number of benzene atoms: 6
Number of phenol atoms: 13

The second line, Number of phenol atoms: 13, will be surprising to some. The behavior of the OEParseSmiles function is to add the given SMILES to the current molecule. OEChem provides a mechanism for reusing a molecule by calling the Clear method. Clear deletes all atoms and bonds of a molecule, thereby reseting a molecule into it’s original “empty” state.

Clearing and reusing a molecule

import openeye.oechem.*;

public class ReuseMoleculeClear {
  public static void main(String argv[]) {
    OEGraphMol mol = new OEGraphMol();

    oechem.OEParseSmiles(mol, "c1ccccc1");
    System.out.println("Number of benzene atoms: " + mol.NumAtoms());

    mol.Clear();

    oechem.OEParseSmiles(mol, "c1ccccc1O");
    System.out.println("Number of phenol atoms: " + mol.NumAtoms());
  }
}

The output of the preceding program is the following:

Number of benzene atoms: 6
Number of phenol atoms: 7

Using the Clear method is recommended, for example, when processing multiple molecules sequentially in a database. Instead of requiring a new molecule to be allocated and destroyed for each entry, the Clear method can be used to reset a molecule to its initial “empty” state.

Unique Representation

It is sometimes useful to generate a unique representation of a molecule for use as an identifier for a database key. The compact nature of SMILES strings make them an ideal candidate for the task. However, the same molecule can be represented by many different SMILES strings. OEChem features an advanced algorithm for generating a (unique) canonical SMILES string. A canonical SMILES string can be generated from a molecule by calling the OECreateCanSmiString function.

Creating a canonical SMILES string from a molecule

import openeye.oechem.*;

public class CreateSMILESFromOEGraphMol {
  public static void main(String argv[]) {
    OEGraphMol mol = new OEGraphMol();
    
    oechem.OEParseSmiles(mol, "C1=CC=CC=C1");
    oechem.OEAssignAromaticFlags(mol);

    String cansmi = oechem.OECreateCanSmiString(mol);
    System.err.println("Canonical SMILES is " + cansmi);
  }
}

The output of the preceding program is the following:

Canonical SMILES is c1ccccc1

The following slightly more complicated example reads SMILES from standard input and writes the corresponding canonical SMILES to standard output.

Creating canonical SMILES strings (version 1)

import java.io.*;
import openeye.oechem.*;

public class CreateSMILESFromOEGraphMolRW {
  public static void main(String argv[]) {
    String str;
    try {
      InputStreamReader isr = new InputStreamReader(System.in);
      BufferedReader br = new BufferedReader(isr);
      
      while ( (str = br.readLine()) != null ) {
        OEGraphMol mol = new OEGraphMol();
        if (oechem.OEParseSmiles(mol, str)) {
          String cansmi = oechem.OECreateCanSmiString(mol);
          System.err.println("Canonical SMILES is " + cansmi);
        }
        else 
          oechem.OEThrow.Warning(str + " is an invalid SMILES!");
      } 
    }
    catch (Exception e) {
      System.err.println(e);
    }         
  }
}
input output
c1cccnc1(O) c1ccnc(c1)O
C1=CC=CC=C1 C1=CC=CC=C1
C1=CN=CC=C1 C1=CC=NC=C1
C1=CC=CC=N1 C1=CC=NC=C1
C1=NC=CN1CCC(=O)O C1=CN(C=N1)CCC(=O)O

As was shown in the Construction from SMILES section, OEParseSmiles preserves the aromaticity present in the input SMILES string. The function OEAssignAromaticFlags has to be used to perceive aromaticity in a molecule.

Creating canonical SMILES strings (version 2)

import java.io.*;
import openeye.oechem.*;

public class CreateSMILESFromOEGraphMolRWBetter {
  public static void main(String argv[]) {
    OEGraphMol mol = new OEGraphMol();
    String str;
    try {
      InputStreamReader isr = new InputStreamReader(System.in);
      BufferedReader br = new BufferedReader(isr);
      
      while ( (str = br.readLine()) != null ) {
        mol.Clear();
        if (oechem.OEParseSmiles(mol, str)) {
          oechem.OEAssignAromaticFlags(mol);
          String cansmi = oechem.OECreateCanSmiString(mol);
          System.err.println("Canonical SMILES is " + cansmi);
        }
        else 
          oechem.OEThrow.Warning(str + " is an invalid SMILES!");
      } 
    }
    catch (Exception e) {
      System.err.println(e);
    }         
  }
}

Notice that the preceding program does not construct and destruct molecules each time through the loop, but rather uses the Clear function to reuse the molecule. If the line mol.Clear() were removed from the program, the output would contain longer and longer SMILES containing disconnected fragments, see section Reuse for more details.

input output (canonical SMILES)
c1cccnc1(O) c1ccnc(c1)O
C1=CC=CC=C1 c1ccccc1
C1=CN=CC=C1 c1ccncc1
C1=CC=CC=N1 c1ccncc1
C1=NC=CN1CCC(=O)O c1cn(cn1)CCC(=O)O

OEChem also provides canonical isomeric SMILES generation. A canonical isomeric SMILES string can be generated from a molecule by calling the OECreateIsoSmiString function.

Creating canonical isomeric SMILES strings

import java.io.*;
import openeye.oechem.*;

public class CreateIsoSMILESFromOEGraphMolRW {
  public static void main(String argv[]) {
    OEGraphMol mol = new OEGraphMol();
    String str;
    try {
      InputStreamReader isr = new InputStreamReader(System.in);
      BufferedReader br = new BufferedReader(isr);
      
      while ( (str = br.readLine()) != null ) {
        mol.Clear();
        if (oechem.OEParseSmiles(mol, str)) {
          oechem.OEAssignAromaticFlags(mol);
          String cansmi = oechem.OECreateIsoSmiString(mol);
          System.err.println("Isomeric canonical SMILES is " + cansmi);
        }
        else 
          oechem.OEThrow.Warning(str + " is an invalid SMILES!");
      } 
    }
    catch (Exception e) {
      System.err.println(e);
    }         
  }
}
input output (canonical isomeric SMILES)
C1CCCN[C@@H]1(O) C1CCN[C@@H](C1)O
C1CN[C@H](O)CC1 C1CCN[C@@H](C1)O
C1CC[C@H](O)CC1 C1CC[C@@H](CC1)O
C1CCC(O)CC1 C1CCC(CC1)O
C1=NC=CN1C[C@H](N)C(=O)O c1cn(cn1)C[C@@H](C(=O)O)N