Meta information about a molecule is stored in what is known as ‘tagged data’. The most common example of this is the data fields found in SDF files. Since SD files are a common form of data storage and transfer from one system to another, OEChem provides several methods to manipulate this data. A simple class, OESDDataPair is used to set and retrieve SD data.
| Data | Set method | Get method |
|---|---|---|
| tag | OESDDataPair.SetTag | OESDDataPair.GetTag |
| data | OESDDataPair.SetValue | OESDDataPair.GetValue |
The following functions provide access to the SD data.
| Function | Description |
|---|---|
| OESetSDData | set a tag and value data pair |
| OEAddSDData | add a tag and value data pair |
| OEHasSDData | determine whether a molecule has an data with a given tag |
| OEGetSDData | get the value for the given tag |
| OEGetSDDataPairs | return an iterator over all the SD data pairs of the molecule |
| OECopySDData | copy the entire set of SD data from one molecule to an other |
| OEDeleteSDData | delete all SD data items with the given tag |
| OEClearSDData | clear all SD data from a molecule |
Since OESDDataPair stores SD data information in a string, numeric values have to be converted to a string with Python’s str() method before storing a value with either OESetSDData or OEAddSDData functions.
OESetSDData(mol,"number of atoms",str(mol.NumAtoms()))
Similarly the retrieved string can be converted to a numeric value with int() or float() functions.
if OEHasSDData(mol,"weight"):
weight = float(OEGetSDData(mol,"weight"))
print "weight=",weight
The following example shows how to manipulate SD tagged data.
Listing 1: SD data manipulation
#!/usr/bin/env python
from openeye.oechem import *
import os, sys
def DumpSDData(mol):
print "SD data of",mol.GetTitle()
#loop over SD data
for dp in OEGetSDDataPairs(mol):
print dp.GetTag(),':',dp.GetValue()
print
mol = OEGraphMol()
OEParseSmiles(mol, "c1ccccc1")
mol.SetTitle("benzene")
# set some tagged data
OESetSDData(mol,'color','brown')
OESetSDData(mol,OESDDataPair('size','small'))
DumpSDData(mol)
# check for existence of data, then delete it
if OEHasSDData(mol,'size'):
OEDeleteSDData(mol,'size')
DumpSDData(mol)
# add additional color data
OEAddSDData(mol,'color','black')
DumpSDData(mol)
# remove all SD data
OEClearSDData(mol)
DumpSDData(mol)
Note
If a data with the same tag already exists:
The output of the preceding program is the following:
SD data of benzene
color : brown
size : small
SD data of benzene
color : brown
SD data of benzene
color : brown
color : black
SD data of benzene
Note
Note that SD tagged data is specific to MDL’s SD file format. Any SD data added to a molecule will only be written out to SD files or OEBinary files. The SD data fields will only be filled when reading from SD files that contain SD tagged data or from OEBinary files previously created to contain this data.
The OEPDBDataPair class is used to set and retrieve PDB data pairs.
| Data | Set method | Get method |
|---|---|---|
| tag | OEPDBDataPair.SetTag | OEPDBDataPair.GetTag |
| data | OEPDBDataPair.SetValue | OEPDBDataPair.GetValue |
The following functions provide access to the PDB data.
| Function | Description |
|---|---|
| OESetPDBData | set a tag and value data pair |
| OEAddPDBData | add a tag and value data pair |
| OEHasPDBData | determine whether a molecule has an data with a given tag |
| OEGetPDBData | get the value for the given tag |
| OEGetPDBDataPairs | return an iterator over all the PDB data pairs of the molecule |
| OECopyPDBData | copy the entire set of PDB data from one molecule to an other |
| OEDeletePDBData | delete all PDB data items with the given tag |
| OEClearSDData | clear all PDB data from a molecule |
Note
In case of PDB header items like REMARK, each line is treated as a separate instance, Therefore these multiple lines have to be added with OEAddPDBData and can be accessed via OEGetPDBDataPairs.
The following PDB fields are stored as tagged PDB data when OEIFlavor_PDB_DATA input flavor is set:
| AUTHOR | CAVEAT | COMPND | CRYST1 | DBREF |
| EXPDTA | FORMUL | HEADER | HELIX | HET |
| HETNAM | HETSYM | JRNL | KEYWDS | MODRES |
| MTRIX1 | MTRIX2 | MTRIX3 | OBSLTE | ORIGX1 |
| ORIGX2 | ORIGX3 | REMARK | REVDAT | SCALE1 |
| SCALE2 | SCALE3 | SEQRES | SEQADV | SHEET |
| SITE | SOURCE | SPRSDE | SSBOND | TITLE |
| TURN |
Warning
The tags of PDB data are always 6 character long ans space-padded (for example "HELIX " and not "HELIX" ).
The following example shows how to manipulate PDB tagged data.
Listing 2: PDB data manipulation
#!/usr/bin/env python
from openeye.oechem import *
import os, sys
if len(sys.argv) != 2:
OEThrow.Usage("%s <pdbfile>"%sys.argv[0])
ifs = oemolistream()
if not ifs.open(sys.argv[1]):
OEThrow.Fatal("Unable to open %s"%sys.argv[1])
# need to set input flavor to ensure PDB data is stored on molecule
ifs.SetFlavor(OEFormat_PDB, OEIFlavor_Generic_Default |
OEIFlavor_PDB_Default | OEIFlavor_PDB_DATA)
mol = OEGraphMol()
while OEReadMolecule(ifs,mol):
if OEHasPDBData(mol,"COMPND"):
print "COMPND:"
print OEGetPDBData(mol,"COMPND")
if OEHasPDBData(mol,"HELIX "):
print "HELIX:"
print OEGetPDBData(mol,"HELIX ")
if OEHasPDBData(mol,"SSBOND"):
print "SSBOND:"
for dp in OEGetPDBDataPairs(mol):
if dp.GetTag() == "SSBOND":
print dp.GetValue()
The output of the preceding program for 1D1H is the following:
COMPND:
MOL_ID: 1;
HELIX:
1 1 THR A 11 ASP A 14 5
SSBOND:
1 CYS A 2 CYS A 16 1555 1555
2 CYS A 9 CYS A 21 1555 1555
3 CYS A 15 CYS A 28 1555 1555
Note
Note that PDB tagged data is specific to PDB file format. Any PDB data added to a molecule will only be written out to PDB files or OEBinary files.
See also
For using tag data with multi-conformer molecules, see Dude, where’s my SD data?.