29.2 Library Generation

The OELibraryGen was designed to give programmers a high degree of control when applying chemical transformations. It was also designed for efficiency. Potentially costly preprocessing is performed a single time before transformations can be carried out. The relative setup cost of a OELibraryGen instance may be high, and the memory use large as preprocessed reactants are stored in memory. Subsequent generation of products,however, is very efficient because setup costs are paid in advance. The OELibraryGen class serves a dual purpose of managing sets of preprocessed starting materials, and storing a list of chemical transform operations defined by a reaction molecule.

Chemical transform operations are carried out on starting materials. Starting materials provide most of the virtual matter that goes into making virtual product molecules. The OELibraryGen class provides an interface to associate starting materials with reactant patterns using the OELibraryGen::SetStartingMaterial and OELibraryGen::AddStartingMaterial methods. These methods associate starting materials to reactant patterns using the index (reactant number) of the pattern. Reactant patterns are numbered starting at zero for the lowest atom index and all atoms that are a members of the same connected component. The next reactant pattern begins with the next lowest atom index that is not a member of the first component. In a SMIRKS pattern the first reactant (reactant number zero) is the furthest reactant on the left. Disconnected reactant patterns may be grouped into a single component using component level grouping in SMIRKS denoted by parentheses.

Once a reaction has been defined, and starting materials have been associated with each of the reactant patterns, chemical transformations can be applied to combinations of starting materials. To achieve a chemically reasonable output attention should be given to the mode of valence (or hydrogen count) correction that matches the reaction. The OELibraryGen class has three possible modes of valence correction: explicit hydrogen, implicit hydrogen , and automatic. The default mode for valence correction and SMIRKS interpretation is to emulate the Daylight Reaction Toolkit. Hydrogen counts are adjusted using explicit hydrogens in SMIRKS patterns. Reactions are carried out using explicit hydrogens, and valence correction occurs when explicit hydrogens are added or deleted as defined by a reaction. The following example demonstrates strict SMIRKS and explicit hydrogen handling.

 1 #include "openeye.h"
 2 #include "oechem.h"
 3 #include <iostream>
 4
 5 using namespace std;
 6 using namespace OEChem;
 7 using namespace OESystem;
 8
 9 int main()
10 {
11   OELibraryGen libgen("[O:1]=[C:2][Cl:3].[N:4][H:5]>>[O:1]=[C:2][N:4]");
12
13   OEGraphMol mol;
14   OEParseSmiles(mol,"CC(=O)Cl");
15   libgen.SetStartingMaterial(mol,0);
16
17   mol.Clear();
18   OEParseSmiles(mol,"NCC");
19   libgen.SetStartingMaterial(mol,1);
20
21   OEIter<OEMolBase> product;
22   for (product = libgen.GetProducts();product;++product)
23   {
24     std::string smi;
25     OECreateCanSmiString(smi,product);
26     cout << "smiles = " << smi << endl;
27   }
28
29   return 0;
30 }

Listing:29.2 Strict SMIRKS Reaction Handling

In the amide bond forming reaction a hydrogen atom attached to the nitrogen in the amine pattern is explicitly deleted when forming the product. When executed, the example generates two products in total. Each product corresponds to the equivalent protons attached to the amine. If a unique set of products is desired, canonical smiles strings may be stored for verification that products generated are indeed unique.

The following demonstrates how the same basic reaction given in the previous example can be carried out using the implicit hydrogen correction mode. Notice that no explicit hydrogens appear in the reaction. Instead, the SMARTS implicit hydrogen count operator appears on the right hand side of the reaction and is used to assign the implicit hydrogen count of the product nitrogen.

 1 #include "openeye.h"
 2 #include "oechem.h"
 3 #include <iostream>
 4
 5 using namespace std;
 6 using namespace OEChem;
 7 using namespace OESystem;
 8
 9 int main()
10 {
11   OELibraryGen libgen("[O:1]=[C:2][Cl:3].[N:4]>>[O:1]=[C:2][Nh1:4]");
12   libgen.SetExplicitHydrogens(false);
13
14   OEGraphMol mol;
15   OEParseSmiles(mol,"CC(=O)Cl");
16   libgen.SetStartingMaterial(mol,0);
17
18   mol.Clear();
19   OEParseSmiles(mol,"NCC");
20   libgen.SetStartingMaterial(mol,1);
21
22   OEIter<OEMolBase> product;
23   for (product = libgen.GetProducts();product;++product)
24   {
25     std::string smi;
26     OECreateCanSmiString(smi,product);
27     cout << "smiles = " << smi << endl;
28   }
29
30   return 0;
31 }

Listing:29.3 Reactions Using Implicit Hydrogens

The reaction is written to work with implicit hydrogens (using the lowercase 'h' primitive), and the OELibraryGen instance is set to work in implicit hydrogen mode using the OELibraryGen::SetExplicitHydrogens method.

The final example demonstrates automatic valence correction. In implicit hydrogen mode (set using the OELibraryGen::SetExplicitHydrogens method) automatic valence correction attempts to add or subtract implicit hydrogens in order to retain the valence state observed in the starting materials. Before chemical transformations commence, the valence state for each reacting atom is recorded. After the transform operations are complete the implicit hydrogen count is adjusted to match the beginning state of the reacting atoms. Changes in formal charge are taken into account during the valence correction.

 1 #include "openeye.h"
 2 #include "oechem.h"
 3 #include <iostream>
 4
 5 using namespace std;
 6 using namespace OEChem;
 7 using namespace OESystem;
 8
 9 int main()
10 {
11   OELibraryGen libgen("[O:1]=[C:2][Cl:3].[N:4]>>[O:1]=[C:2][N:4]");
12   libgen.SetExplicitHydrogens(false);
13   libgen.SetValenceCorrection(true);
14
15   OEGraphMol mol;
16   OEParseSmiles(mol,"CC(=O)Cl");
17   libgen.SetStartingMaterial(mol,0);
18
19   mol.Clear();
20   OEParseSmiles(mol,"NCC");
21   libgen.SetStartingMaterial(mol,1);
22
23   OEIter<OEMolBase> product;
24   for (product = libgen.GetProducts();product;++product)
25   {
26     std::string smi;
27     OECreateCanSmiString(smi,product);
28     cout << "smiles = " << smi << endl;
29   }
30
31   return 0;
32 }

Listing:29.4 Reactions Using Automatic Valence Correction

In general, automatic valence correction is a convenience that allows straightforward reactions to be written in simplified manner and reduces the onus of valence state bookkeeping. Reactions that alter the preferred valence state of an atom, oxidation for example, may not be automatically correctable.

OELibraryGen objects are normally initialized with a SMIRKS pattern. A boolean argument is used to specify whether the SMIRKS string should be interpreted using strict SMIRKS semantics. Here strict means in full compliance with the SMIRKS language defined by its originator, Daylight CIS , Inc. If the default value of true is used, the SMIRKS string must have corresponding reaction mapped reactant and product atoms. Mapped product atoms that do not have corresponding mapped reactant atoms are considered invalid SMIRKS and will result in a failure to initialize the OELibraryGen instance. Strict SMIRKS also requires unmapped reactant atoms to be destroyed in the reaction. Passing a boolean value of false to the second method argument will relax both of the strict SMIRKS restrictions.

The AddStartingMaterial and SetStartingMaterial methods are used to initialize the starting materials corresponding to a reaction component (reactant). An iterator over molecules or a single molecule may be passed as the first argument to the methods. Subsequent calls to the AddStartingMaterial method append to the list of starting materials set in prior calls. The second argument specifies the reactant by number, starting with zero, to which the starting materials correspond. These numbers correspond with the left to right lexical ordering of reactants in the SMIRKS. The final argument is used to control the pattern matching of the reactant pattern to the staring material. If the value passed is true, only matches that contain a unique set of atoms relative to previously identified matches are used. If the value is false, every possible match including those related by symmetry will be used. Reactant patterns are unique matched by default.

The SetExplicitHydrogens method sets the hydrogen handling mode for the OELibraryGen instance. OELibraryGen instance are constructed by default with the explicit hydrogen mode set to true. Reactions may be executed using either implicit or explicit hydrogens represented in the starting materials for a reaction. If the value is true, the OELibraryGen instance will add explicit hydrogens to reactant molecules when they are initialized using either of the SetStartingMaterial methods. If the value is false, then both of the SetStartingMaterial methods will suppress any explict hydrogens in the reactant molecules, and simply retain the implicit hydrogen counts for remaining non-hydrogen atoms. The hydrogen handling mode must be assigned prior to calling SetStartingMaterial. Calling SetExplicitHydrogens after SetStartingMaterial will have no effect. Note that the explicit hydrogen setting in effect modifies the semantics of smirks. If the programmer wishes to implement strict SMIRKS according to the Daylight standard, in full, explicit hydrogens should be set on.

The SetValenceCorrection method controls the valence correction mode setting of an OELibraryGen instance. OELibraryGen instances are constructed by default with the valence correction mode set to false. Valence correction mode can be turned on by passing a boolean true value to an OELibraryGen instance using this method. When valence correction mode is enabled, the OELibraryGen instance will attempt to adjust the hydrogen count on atoms in the product molecule that are involved in the reaction to match the original valence state of the reactant. For product atoms that do not undergo a nuclear reaction (atomic number is retained), the hydrogen count is either increased or decreased to match the initial valence state of the corresponding reactant atom. Formal charge is taken into account during the hydrogen count adjustment. Note that valence correction in effect modifies the semantics of smirks. Thus, if the programmer wishes to implement strict SMIRKS according to the Daylight standard, in full, valence correction should be set off.