Subsections

 
6.3 Manipulation of Tagged Data

6.3.1 Manipulating SD Tagged Data

Meta information about a molecule is stored in what is known as ``tagged data.'' The most common example of this is the data fields found in SD files. Since SD files are a common form of data storage and transfer from one system to another, OEChem provides several methods to manipulate this data. A simple class, OESDDataPair is used to set or retrieve these pairs. OESDDataPair objects provide SetTag /GetTag and SetValue /GetValue methods for access to each half of the pair.

If you wish to store a numeric value, use Java's String.valueOf() method to convert it to a string.

The following functions provide access to the SD data.

6.3.1.1 Storing SD Data on a Molecule

Use the OESetSDData method to set a tag and value data pair. Both the tag and the value must be strings. If an item with the same tag already exists, it is replaced. The second form is the same as the first but uses an OESDDataPair instance.

boolean OESetSDData(OEMolBase mol, String tag, String value)
boolean OESetSDData(OEMolBase mol, OESDDataPair dp)

Listing:6.2

Use the OEAddSDData method to add a tag and value data pair. Both the tag and the value must be strings. If an item with the same tag already exists, another one is added. The second form is the same as the first but uses an OESDDataPair instance.

boolean OEAddSDData(OEMolBase mol, String tag, String value)
boolean OEAddSDData(OEMolBase mol, OESDDataPair dp)

Listing:6.3

6.3.1.2 Retrieving SD Data from a Molecule

Use the OEHasSDData method to determine if a molecule has an item with a given tag:

boolean OEHasSDData(OEMolBase mol, String tag)

Listing:6.4

Use the OEGetSDData method to get the value for the given tag. If the molecule does not have that tag, an empty string is returned.

String OEGetSDData(OEMolBase mol, String tag)

Listing:6.5

An OESDDataIter (iterator of SDDataPairs ) can be used in a loop as shown in the following example.

OESDDataIter OEGetSDDataPairs(OEMolBase mol)

Listing:6.6

6.3.1.3 Copying SD Data

Use OECopySDData to copy the entire set of SD data from a source(src) molecule to a destination(dst) molecule.

boolean OECopySDData(OEMolBase dst, OEMolBase src)

Listing:6.7

6.3.1.4 Deleting SD Data from a Molecule

Use OEDeleteSDData to delete a tagged data item. All data items with the specified tag will be deleted.

boolean OEDeleteSDData(OEMolBase mol, String tag)

Listing:6.8

Use OEClearSDData to clear all SD data from a molecule.

boolean OEClearSDData(OEMolBase mol)

Listing:6.9

6.3.1.5 SD Data Example

The following example shows how to use the tagged data methods.

 1 /*******************************************************************************
 2  * Copyright 2005, OpenEye Scientific Software, Inc.
 3  ******************************************************************************/
 4
 5 import openeye.oechem.*;
 6
 7 public class SDDataExample {
 8     public static void main(String argv[]) {
 9         OEGraphMol mol = new OEGraphMol();
10         oechem.OEParseSmiles(mol, "c1ccccc1");
11         mol.SetTitle("benzene");
12
13         // set some tag data
14         oechem.OESetSDData(mol, "color", "brown");
15         oechem.OESetSDData(mol, "size", "small");
16         oechem.OESetSDData(mol, "natoms", String.valueOf(mol.NumAtoms()));
17
18         // loop over data and print it out
19         for (OESDDataIter iter=oechem.OEGetSDDataPairs(mol);iter.hasNext();) {
20             OESDDataPair dp = iter.next();
21             System.out.println(dp.GetTag() + " : " + dp.GetValue());
22         }
23
24         // check for existence of a field and delete it
25         if (oechem.OEHasSDData(mol, "color"))
26             oechem.OEDeleteSDData(mol, "color");
27
28         // loop again
29         for (OESDDataIter iter=oechem.OEGetSDDataPairs(mol);iter.hasNext();) {
30             OESDDataPair dp = iter.next();
31             System.out.println(dp.GetTag() + " : " + dp.GetValue());
32         }
33     }
34 }

Listing:6.10 Manipulating SD tagged data

Note that SD tagged data is specific to MDL's SD file format. Any data added to a molecule will only be written out to SD files or OEBinary files. The SD data fields will only be filled when reading from SD files that contain SD tagged data or from OEBinary files previously created to contain this data.

Two more examples are provided specifically dealing with tagged data. SDFRename.java takes an SD file and renames all the molecules based on the value of a chosen SD tag. The other, MergeCSV.java, takes a csv file and adds the data as tags to molecules in an input stream. This simple program assumes that the first column is the molecule title matching titles found in the incoming molecule file. It also assumes the first row contains names to be used as the tags.

6.3.2 Manipulating PDB Tagged Data

The OESDDataPair class is also used to set or retrieve PDB data pairs. In PDB files, this data is stored in header lines where the first field is the tag and the remainder of the line is the data. OESDDataPair objects provide SetTag /GetTag and SetValue /GetValue methods for access to each half of PDB pairs.

If you wish to store a numeric value, use Java's String.valueOf() method to convert it to a string.

The following functions provide access to the PDB data.

6.3.2.1 Storing PDB Data on a Molecule

Use OESetPDBData to set a tag and value data pair. Both tag and value must be strings. If an item with the same tag already exists, it is replaced. The second form is the same as the first but uses an OESDDataPair instance.

boolean OESetPDBData(OEMolBase mol, String tag, String value)
boolean OESetPDBData(OEMolBase mol, OESDDataPair dp)

Listing:6.11

Use OEAddPDBData to add a tag and value data pair. Both tag and value must be strings. If an item with the same tag already exists, another one is added. The second form is the same as the first but uses an OESDDataPair instance.

Note that for PDB header items like REMARK, each line is treated as a separate instance, so to add multiple REMARK lines be sure to use this form instead of OESetPDBData.

boolean OEAddPDBData(OEMolBase mol, String tag, String value)
boolean OEAddPDBData(OEMolBase mol, OESDDataPair dp)

Listing:6.12

6.3.2.2 Retrieving PDB Data from a Molecule

To determine if a molecule has an item with tag:

boolean OEHasPDBData(OEMolBase mol, String tag)

Listing:6.13

Use OEGetPDBData to get the value for the given tag. If the molecule does not have that tag, an empty string is returned. Note that if there are multiple parts with the same tag, this will only return the first instance. Using the iterator access show below will allow retrieving multiple tags.

String OEGetPDBData(OEMolBase mol, String tag)

Listing:6.14

To get access to all PDB data, an iterator of OEBPDBDataPair can be used.

OEPDBDataIter OEGetPDBDataPairs(OEMolBase mol)

Listing:6.15

6.3.2.3 Copying PDB Data

To copy the entire set of PDB data from a source (src) molecule to a destination (dst) molecule, use OECopyPDBData .

boolean OECopyPDBData(OEMolBase dst, OEMolBase src)

Listing:6.16

6.3.2.4 Deleting PDB Data from a Molecule

Use OEDeletePDBData to delete a tagged data item. All data items with the specified tag will be deleted.

boolean OEDeletePDBData(OEMolBase mol, String tag)

Listing:6.17

To clear all PDB data from a molecule, use OEClearPDBData .

boolean OEClearPDBData(OEMolBase mol)

Listing:6.18

 
6.3.3 Multi-conformer molecules

For using tag data with multi-conformer molecules, see Section 7.6.