These examples are designed to introduce OEChem along with its breadth, power, and flexibility by way of example. The examples are available as source code and include useful applications and code which supplements the API documentation and from which different and more extensive programs may be developed.
Reading and writing molecules
OEChem provides a rich set of tools for handling molecule file i/o with a large set of formats, currently: smi, mdl, pdb, mol2, bin, ism, mol2h, sdf, can, mf, xyz, fasta, mopac, oeb, mmod, and sln. In many cases multiple variants of a format are handled, such as mol2 with hydrogens, canonical or isomeric smiles, and pdb w/ bonds. The OEFlavor API provides convenience in dealing with the resulting effective 1000s of de-facto formats. Despite advances in cheminformatics file interconversion remains problematic for several reasons including ambiguous de-facto format definitions and inconsistency in the models and information represented. OEChem handles multiple known models (e.g., MDL, Tripos, Daylight) and other variations. High level file i/o automatically applies format-appropriate standardizations such as aromaticity. Low level i/o functions make it possible to customize these settings (e.g., mol2 output with custom atom types, smiles with MDL aromaticity, canonical smiles with no aromaticity). OEChem takes special care with multi-conformer molecules, which can be stored explicitly as such in OEBinary (and TDT), but not in most all other common formats. A facile API for recognizing multiconformers consistently and flexibly across formats is provided. The key test is when are two molecules the same. The important question of when two molecules are the same is relevant to canonicalization. OEChem can generate canonical smiles, canonical kekule smiles and other variations. OEChem can canonicalize atom and bond order for other formats and thus create, for example, a "canonical mol2 file". Programs: babel2.cpp, convert.py, convert webapp |
|
Fixing/standardizing molecules
C(=O)[OH] --> C(=O)[O-] , etc. Even when chemical information is accurately converted and handled and is chemically correct, there may remain the issue of alternate conventional representations as reflected by registration systems business rules. Some conventions are built into the OpenEye and other valence/hcount models. Others must be handled manually. The separate OpenEye tautomer code (QuacPac) and upcoming reaction toolkit can/shall apply. Programs: norm.py, valencemodels.py, aromodels.py |
Analyzing molecules
A powerful, flexible and rich set of methods are provided for analyzing molecules and retrieving stored and derived properties. Why SSSR bad? -- see manual. Programs: rings.py, molanalyze.py, molanalyze webapp |
Substructure searching/matching
Substructure searching, i.e., molecular subgraph isomorphism is a critical task. It is also a common task, but the semantics of substructure searching are not universally consistent. OEChem provides the flexibility to handle any type of connectivity based search. The OEQMol class is a query-mol and can be initialized by a smarts or molecule. Atom and bond expressions can be combined in boolean fashion to define when two atoms or bonds match. The molecule state can also affect the semantics, for example, searching for explicit hydrogen atoms, or aromaticity. The USA flag equates equivalent matches with the same unique-setof-atoms. A match results in an OEMatch object which is itself highly flexible and useful for numerous subsequent tasks, such as alignment, or defining a new submol. Programs: smigrep.cpp, match.py, match.cpp |
MCSS
Maximum Common SubStructure analysis is of considerable interest. The functionality is built into OEChem along with a flexible API to allow for variations in MCSS semantics. Boolean expressions define what atoms and bonds match. OEMCSFunc's define scoring to rank hits. Programs: mcss.py, molalign.py, mcss webapp |
Conformation handling and analysis
OEChem's extensive and powerful handling of conformations and 3D reflects OpenEye's focus on 3D molecular modeling. RMSD with auto-isomorphism is one of many features both high- and low-level. Programs: molgeom.cpp, molgeom.py, setrotor.py, add3dhyd.cpp, rmsd.cpp |
Macromolecules
OEChem is particularly adept at parsing PDB files, which are notoriously ideosyncratic and inconsistent at best. The OEResidue class is defined as a property of atoms, which is non-intuitive to some, but technically advantageous and arguably more physically and informationally correct (Democritus, Oracle). Programs: resanalyze.py, pdb2lig.py |
Combinatorics
Combinatorial chemistry presents unique data processing challenges. Virtual compound libraries need to be expressed rigorously, enumerated and handled at various levels of abstraction. OEChem does not include the reaction toolkit which is in development as a separate product. However, OEChem has some key features including some smiles extensions which facilitate combinatorial work. Programs: flipper.cpp, frankenmol.py |
Non-structural data
Programs: rename.py, sdf2csv.py, bitvectest.cpp |
Author: Jeremy Yang