Skip to Content

Using DBMols and MiniMols

Finally another in our irregular series of Toolkit Tips. This one is in response to a question posed on our HiveMind website. In case you haven't visited it HiveMind is the replacement for our FAQs and forums. It uses a similar format to the coding website StackExchange. It enables people to ask questions and have other users and OpenEye employees provide answers. Users can 'vote' on particular questions if they find them useful or not.

Anyway, the question concerned how to use the molecule types OEDBMol and OEMiniMol. The documentation on these types is minimal to say the least and I thought actual code examples would be a good way to show their use.

Both perform a similar task, they are a way of storing more molecules in memory by reducing the amount of memory used per molecule. A significant difference between them is that DBMols must be uncompressed before they can be used but MiniMols are still able to be read in a compressed state. They are both different to the standard OEGraphMol which is optimized for speed of access rather than memory usage.

The following code example shows the difference in performance of DBMols and MiniMols when used to store a database of molecules in memory. The database is then searched using a SMARTS pattern. As the DBMols need to be uncompressed and recompressed there is a significant cost in speed. Firstly I'll just show the snippet of the code that shows how you create DBMols. You will notice that this is the first tip that shows code examples using C#, a language we started supporting in the October release of the OpenEye toolkits:

C++

std::vector<OEGraphMol> mols;
OEGraphMol mol(OEMolBaseType::OEDBMol);
while (OEReadMolecule(ifs, mol))
{
  mol.Compress();
  mols.push_back(mol);
}

Java

List<OEGraphMol> mList;
OEGraphMol mol = new OEGraphMol(OEMolBaseType.OEDBMol);
while (oechem.OEReadMolecule(ifs, mol)) {
    mol.Compress();
    mList.add(mol);
    mol = new OEGraphMol(OEMolBaseType.OEDBMol);
}

Python

mlist = []
for mol in ifs.GetOEGraphMols():
    newmol = OEGraphMol(mol, OEMolBaseType_OEDBMol)
    newmol.Compress()
    mlist.append(newmol)

C#

ArrayList mlist;
OEGraphMol mol = new OEGraphMol(OEMolBaseType.OEDBMol);
while (OEChem.OEReadMolecule(ifs, mol))
{
    mol.Compress();
    mlist.Add(mol);
    mol = new OEGraphMol(OEMolBaseType.OEDBMol);
}

It can be seen that the use of DBMols requires two additional bits of code. One is the namespace constant OEMolBaseType::OEDBMol in the declaration of the OEGraphMol, and the other the use of Compress on the resulting molecule. To declare a MiniMol merely requires a change in the namespace constant:

C++

std::vector<OEGraphMol> mols;
OEGraphMol mol(OEMolBaseType::OEMiniMol);
while (OEReadMolecule(ifs, mol))
{
  mol.Compress();
  mols.push_back(mol);
}

Java

List<OEGraphMol> mList;
OEGraphMol mol = new OEGraphMol(OEMolBaseType.OEMiniMol);
while (oechem.OEReadMolecule(ifs, mol)) {
    mol.Compress();
    mList.add(mol);
    mol = new OEGraphMol(OEMolBaseType.OEMiniMol);
}

Python

mlist = []
for mol in ifs.GetOEGraphMols():
    newmol = OEGraphMol(mol, OEMolBaseType_OEMiniMol)
    newmol.Compress()
    mlist.append(newmol)

C#

ArrayList mlist;
OEGraphMol mol = new OEGraphMol(OEMolBaseType.OEMiniMol);
while (OEChem.OEReadMolecule(ifs, mol))
{
    mol.Compress();
    mlist.Add(mol);
    mol = new OEGraphMol(OEMolBaseType.OEMiniMol);
}

These two sets of code examples above are so similar that they need not be in separate functions. There could be a single function that takes the namespace constant as an input parameter. As noted, once you have a MiniMol set up and compressed it is a read-only molecule, so you can't alter any of the properties within it. You can use it in a call to the OESubSearch Match function as shown below. You could also use MiniMols anywhere you would want to use the molecule graph, for instance in the generation of fingerprints in GraphSim, or depictions using OEDepict.

C++

for (v_i = mols.begin(); v_i != mols.end(); ++v_i)
{
  if (ss.SingleMatch(*v_i))
    matchcount++;
}

Java

for (OEGraphMol mol : mList) {
    if (ss.SingleMatch(mol)) {
        matchcount++;
    }
}

Python

for mol in mlist:
    if ss.SingleMatch(mol):
        matchcount += 1

C#

foreach (OEGraphMol mol in mlist)
{
    if (ss.SingleMatch(mol))
    {
        matchcount++;
    }
}
DBMols are different though and do require the use of calls to UnCompress and Compress as shown:

C++

for (v_i = mols.begin(); v_i != mols.end(); ++v_i)
{
  v_i->UnCompress();
  if (ss.SingleMatch(*v_i))
    matchcount++;
  v_i->Compress();
}

Java

for (OEGraphMol mol : mList) {
    mol.UnCompress();
    if (ss.SingleMatch(mol)) {
        matchcount++;
    }
    mol.Compress();
}

Python

for mol in mlist:
    mol.UnCompress()
    if ss.SingleMatch(mol):
        matchcount += 1
    mol.Compress()

C#

foreach (OEGraphMol mol in mlist)
{
    mol.UnCompress();
    if (ss.SingleMatch(mol))
    {
        matchcount++;
    }
    mol.Compress();
}

The effect of uncompressing and recompressing the DBMol is best seen by timing the performance versus MiniMols. To do a single search against a database of two million isomeric SMILES you get the following result (for the C++ version):

Loaded database as MiniMols in 72.94 secs
242307 matches found in MiniMol database in 1.5 secs
Loaded database as DBMols in 74.95 secs
242307 matches found in DBMol database in 20.37 secs

That shows a typical result, using DBMols is 10 to 20 times slower than using MiniMols. It doesn't show the results you'd get with using an OEGraphMol. The speed of accessing those is faster than MiniMols but it is not possible to get so many into memory. The final scripts toolkittip4.cpp, toolkittip4.py, toolkitTip4.java and toolkitTip4.cs are attached below if anyone wishes to download them and try them out.

The following code snippets (which aren't in the attached scripts) attempt to show the types of calls that can and can't be made when using these types. Where a call is marked as invalid it may result in a hard crash. The first snippets are for DBMols, where read and write operations on the compressed molecule are not allowed:

C++

OEGraphMol m(OEMolBaseType::OEDBMol);
OEReadMolecule(ifs, m);
m.GetAtoms(); //valid
m.Compress();
m.GetAtoms(); //invalid
m.NewAtom();  //invalid
m.UnCompress();
m.GetAtoms(); //valid
m.NewAtom();  //valid

Java

OEGraphMol m = new OEGraphMol(OEMolBaseType.OEDBMol);
oechem.OEReadMolecule(ifs, m);
m.GetAtoms(); //valid
m.Compress();
m.GetAtoms(); //invalid
m.NewAtom();  //invalid
m.UnCompress();
m.GetAtoms(); //valid
m.NewAtom();  //valid

Python

m = OEGraphMol(OEMolBaseType_OEDBMol)
OEReadMolecule(ifs, m)
m.GetAtoms() #valid
m.Compress()
m.GetAtoms() #invalid
m.NewAtom()  #invalid
m.UnCompress()
m.GetAtoms() #valid
m.NewAtom()  #valid

C#

OEGraphMol m = new OEGraphMol(OEMolBaseType.OEDBMol);
oechem.OEReadMolecule(ifs, m);
m.GetAtoms(); //valid
m.Compress();
m.GetAtoms(); //invalid
m.NewAtom();  //invalid
m.UnCompress();
m.GetAtoms(); //valid
m.NewAtom();  //valid

For MiniMols it is slightly different, the read operations in the compressed state are fine but the write operations are still invalid:

C++

OEGraphMol m(OEMolBaseType::OEMiniMol);
OEReadMolecule(ifs, m);
m.GetAtoms(); //valid
m.Compress();
m.GetAtoms(); //valid
m.NewAtom();  //invalid
m.UnCompress();
m.GetAtoms(); //valid
m.NewAtom();  //valid

Java

OEGraphMol m = new OEGraphMol(OEMolBaseType.OEMiniMol);
oechem.OEReadMolecule(ifs, m);
m.GetAtoms(); //valid
m.Compress();
m.GetAtoms(); //valid
m.NewAtom();  //invalid
m.UnCompress();
m.GetAtoms(); //valid
m.NewAtom();  //valid

Python

m = OEGraphMol(OEMolBaseType_OEMiniMol)
OEReadMolecule(ifs, m)
m.GetAtoms() #valid
m.Compress()
m.GetAtoms() #valid
m.NewAtom()  #invalid
m.UnCompress()
m.GetAtoms() #valid
m.NewAtom()  #valid

C#

OEGraphMol m = new OEGraphMol(OEMolBaseType.OEMiniMol);
oechem.OEReadMolecule(ifs, m);
m.GetAtoms(); //valid
m.Compress();
m.GetAtoms(); //valid
m.NewAtom();  //invalid
m.UnCompress();
m.GetAtoms(); //valid
m.NewAtom();  //valid

The table below summarizes the types of operations that can be done on DBMols and MiniMols depending on whether they are compressed or not:

OEMiniMol OEDBMol
UnCompressed Read/Write Read/Write
Compressed Read only Neither

That concludes our toolkit tip, look for new posts at http://www.eyesopen.com/tips in the future. Please free to add feedback, comments or questions below.