Subsections

 
5.4 Manipulation of Tagged Data

5.4.1 Manipulating SD Tagged Data

Meta information about a molecule is stored in what is known as ``tagged data.'' The most common example of this is the data fields found in SD files. Since SD files are a common form of data storage and transfer from one system to another, OEChem provides several methods to manipulate this data. A simple class, OESDDataPair is used to set or retrieve these pairs. OESDDataPair objects provide SetTag /GetTag and SetValue /GetValue methods for access to each half of the pair.

If you wish to store a numeric value, use Python's ``str()'' method to convert it to a string and then use ``int()'' or ``float()'' on the value when retrieving the data.

The following functions provide access to the SD data.

5.4.1.1 Storing SD Data on a Molecule

Use the OESetSDData method to set a tag and value data pair. Both the tag and the value must be strings. If an item with the same tag already exists, it is replaced. The second form is the same as the first but uses an OESDDataPair instance.

OESetSDData(mol, tag, value) OESetSDData(mol, dp)

Listing:5.2

Use the OEAddSDData method to add a tag and value data pair. Both the tag and the value must be strings. If an item with the same tag already exists, another one is added. The second form is the same as the first but uses an OESDDataPair instance.

OEAddSDData(mol, tag, value)
OEAddSDData(mol, dp)

Listing:5.3

5.4.1.2 Retrieving SD Data from a Molecule

Use the OEHasSDData method to determine if a molecule has an item with a given tag:

OEHasSDData(mol, tag)

Listing:5.4

Use the OEGetSDData method to get the value for the given tag. If the molecule does not have that tag, an empty string is returned.

OEGetSDData(mol, tag)

Listing:5.5

An OESDDataIter (iterator of SDDataPairs ) can be used in a loop as shown in the following example.

OEGetSDDataPairs(mol)

Listing:5.6

5.4.1.3 Copying SD Data

Use OECopySDData to copy the entire set of SD data from a source(src) molecule to a destination(dst) molecule.

OECopySDData(dest, src)

Listing:5.7

5.4.1.4 Deleting SD Data from a Molecule

Use OEDeleteSDData to delete a tagged data item. All data items with the specified tag will be deleted.

OEDeleteSDData(mol, tag)

Listing:5.8

Use OEClearSDData to clear all SD data from a molecule.

OEClearSDData(mol)

Listing:5.9

5.4.1.5 SD Data Example

The following example shows how to use the tagged data methods.

 1 #!/usr/bin/env python
 2 # ch5-2.py
 3 from openeye.oechem import *
 4 import os, sys
 5
 6 mol = OEGraphMol()
 7 OEParseSmiles(mol, "c1ccccc1")
 8 mol.SetTitle("benzene")
 9
10 # now set some tagged data
11 OESetSDData(mol, 'color', 'brown')
12 OESetSDData(mol, 'size', 'small')
13 OESetSDData(mol, 'natoms', str(mol.NumAtoms()))
14
15 # loop over data and print it out
16 for dp in OEGetSDDataPairs(mol):
17    sys.stdout.write('%s : %s\n' % (dp.GetTag(), dp.GetValue()))
18
19 # check for existence of a field, then delete it
20 if OEHasSDData(mol, 'color') == 1:
21     OEDeleteSDData(mol, 'color')
22
23 # one last loop shows no 'color' field
24 for dp in OEGetSDDataPairs(mol):
25     sys.stdout.write('%s : %s\n' % (dp.GetTag(), dp.GetValue()))

Listing:5.10 Manipulating SD tagged data

Note that SD tagged data is specific to MDL's SD file format. Any data added to a molecule will only be written out to SD files or OEBinary files. The SD data fields will only be filled when reading from SD files that contain SD tagged data or from OEBinary files previously created to contain this data.

Two more examples are provided specifically dealing with tagged data. sdf2csv.py takes an SD file as input and outputs a comma-delimited file (.csv) for importing into Excel or other spreadsheet programs. The other, mergecsv.py, takes a csv file and adds the data as tags to molecules in an input stream. This simple script assumes that the first column is the molecule title matching titles found in the incoming molecule file. It also assumes the first row contains names to be used as the tags.

5.4.2 Manipulating PDB Tagged Data

The OESDDataPair class is also used to set or retrieve PDB data pairs. In PDB files, this data is stored in header lines where the first field is the tag and the remainder of the line is the data. OESDDataPair objects provide SetTag /GetTag and SetValue /GetValue methods for access to each half of PDB pairs.

If you wish to store a numeric value, use Python's ``str()'' method to convert it to a string and then use ``int()'' or ``float()'' on the value when retrieving the data.

The following functions provide access to the PDB data.

5.4.2.1 Storing PDB Data on a Molecule

Use OESetPDBData to set a tag and value data pair. Both tag and value must be strings. If an item with the same tag already exists, it is replaced. The second form is the same as the first but uses an OESDDataPair instance.

OESetPDBData(mol, tag, value)
OESetPDBData(mol, dp)

Listing:5.11

Use OEAddPDBData to add a tag and value data pair. Both tag and value must be strings. If an item with the same tag already exists, another one is added. The second form is the same as the first but uses an OESDDataPair instance.

Note that for PDB header items like REMARK, each line is treated as a separate instance, so to add multiple REMARK lines be sure to use this form instead of OESetPDBData.

OEAddPDBData(mol, tag, value)
OEAddPDBData(mol, dp)

Listing:5.12

5.4.2.2 Retrieving PDB Data from a Molecule

To determine if a molecule has an item with tag:

OEHasPDBData(mol, tag)

Listing:5.13

Use OEGetPDBData to get the value for the given tag. If the molecule does not have that tag, an empty string is returned. Note that if there are multiple parts with the same tag, this will only return the first instance. Using the iterator access show below will allow retrieving multiple tags.

OEGetPDBData(mol, tag)

Listing:5.14

To get access to all PDB data, an iterator of OEBPDBDataPair can be used.

OEGetPDBDataPairs(mol)

Listing:5.15

5.4.2.3 Copying PDB Data

To copy the entire set of PDB data from a source (src) molecule to a destination (dst) molecule, use OECopyPDBData .

OECopyPDBData(dest, src)

Listing:5.16

5.4.2.4 Deleting PDB Data from a Molecule

Use OEDeletePDBData to delete a tagged data item. All data items with the specified tag will be deleted.

OEDeletePDBData(mol, tag)

Listing:5.17

To clear all PDB data from a molecule, use OEClearPDBData .

OEClearPDBData(mol)

Listing:5.18

 
5.4.3 Multi-conformer molecules

For using tag data with multi-conformer molecules, see Section 6.6.