Meta information about a molecule is stored in what is known as
``tagged data.'' The most common example of this is the data fields
found in SD files. Since SD files are a common form of data
storage and transfer from one system to another, OEChem provides
several methods to manipulate this data. A simple class,
OESDDataPair is used to set or retrieve
these pairs. OESDDataPair objects provide
SetTag /GetTag and
SetValue /GetValue methods for access to each half of the pair.
If you wish to store a numeric value, use Python's ``str()'' method to convert it to a string and then use ``int()'' or ``float()'' on the value when retrieving the data.
The following functions provide access to the SD data.
Use the OESetSDData method to set a tag and
value data pair. Both the tag and the value must be strings. If an
item with the same tag already exists, it is replaced. The second form
is the same as the first but uses an
OESDDataPair instance.
OESetSDData(mol, tag, value) OESetSDData(mol, dp)
Use the OEAddSDData method to add a tag and
value data pair. Both the tag and the value must be strings. If an
item with the same tag already exists, another one is added. The
second form is the same as the first but uses an
OESDDataPair instance.
OEAddSDData(mol, tag, value) OEAddSDData(mol, dp)
Use the OEHasSDData method to determine if a
molecule has an item with a given tag:
OEHasSDData(mol, tag)
Use the OEGetSDData method to get the value
for the given tag. If the molecule does not have that tag, an empty
string is returned.
OEGetSDData(mol, tag)
An OESDDataIter (iterator of
SDDataPairs ) can be used in a loop as shown
in the following example.
OEGetSDDataPairs(mol)
Use OECopySDData to copy the entire set of
SD data from a source(src) molecule to a destination(dst) molecule.
OECopySDData(dest, src)
Use OEDeleteSDData to delete a tagged
data item. All data items with the specified tag will be deleted.
OEDeleteSDData(mol, tag)
Use OEClearSDData to clear all SD data
from a molecule.
OEClearSDData(mol)
The following example shows how to use the tagged data methods.
1 #!/usr/bin/env python
2 # ch5-2.py
3 from openeye.oechem import *
4 import os, sys
5
6 mol = OEGraphMol()
7 OEParseSmiles(mol, "c1ccccc1")
8 mol.SetTitle("benzene")
9
10 # now set some tagged data
11 OESetSDData(mol, 'color', 'brown')
12 OESetSDData(mol, 'size', 'small')
13 OESetSDData(mol, 'natoms', str(mol.NumAtoms()))
14
15 # loop over data and print it out
16 for dp in OEGetSDDataPairs(mol):
17 sys.stdout.write('%s : %s\n' % (dp.GetTag(), dp.GetValue()))
18
19 # check for existence of a field, then delete it
20 if OEHasSDData(mol, 'color') == 1:
21 OEDeleteSDData(mol, 'color')
22
23 # one last loop shows no 'color' field
24 for dp in OEGetSDDataPairs(mol):
25 sys.stdout.write('%s : %s\n' % (dp.GetTag(), dp.GetValue()))
Note that SD tagged data is specific to MDL's SD file format. Any data added to a molecule will only be written out to SD files or OEBinary files. The SD data fields will only be filled when reading from SD files that contain SD tagged data or from OEBinary files previously created to contain this data.
Two more examples are provided specifically dealing with tagged
data. sdf2csv.py takes an SD file as input and outputs a
comma-delimited file (.csv) for importing into Excel or other
spreadsheet programs. The other, mergecsv.py, takes a csv file
and adds the data as tags to molecules in an input stream. This simple
script assumes that the first column is the molecule title matching
titles found in the incoming molecule file. It also assumes the first
row contains names to be used as the tags.
The OESDDataPair class is also used to set
or retrieve PDB data pairs. In PDB files, this data is
stored in header lines where the first field is the tag and the
remainder of the line is the data. OESDDataPair objects
provide SetTag /GetTag and
SetValue /GetValue methods for access to each half of PDB pairs.
If you wish to store a numeric value, use Python's ``str()'' method to convert it to a string and then use ``int()'' or ``float()'' on the value when retrieving the data.
The following functions provide access to the PDB data.
Use OESetPDBData to set a tag and value
data pair. Both tag and value must be strings. If an item with the
same tag already exists, it is replaced. The second form is the same
as the first but uses an OESDDataPair instance.
OESetPDBData(mol, tag, value) OESetPDBData(mol, dp)
Use OEAddPDBData to add a tag and value
data pair. Both tag and value must be strings. If an item with the
same tag already exists, another one is added. The second form is the
same as the first but uses an OESDDataPair instance.
Note that for PDB header items like REMARK, each line is treated as a
separate instance, so to add multiple REMARK lines be sure to use this
form instead of OESetPDBData.
OEAddPDBData(mol, tag, value) OEAddPDBData(mol, dp)
To determine if a molecule has an item with tag:
OEHasPDBData(mol, tag)
Use OEGetPDBData to get the value for the
given tag. If the molecule does not have that tag, an empty string is
returned. Note that if there are multiple parts with the same tag,
this will only return the first instance. Using the iterator access
show below will allow retrieving multiple tags.
OEGetPDBData(mol, tag)
To get access to all PDB data, an iterator of OEBPDBDataPair
can be used.
OEGetPDBDataPairs(mol)
To copy the entire set of PDB data from a source (src)
molecule to a destination (dst) molecule, use
OECopyPDBData .
OECopyPDBData(dest, src)
Use OEDeletePDBData to delete a tagged
data item. All data items with the specified tag will be deleted.
OEDeletePDBData(mol, tag)
To clear all PDB data from a molecule, use
OEClearPDBData .
OEClearPDBData(mol)
For using tag data with multi-conformer molecules, see Section 6.6.