Title:
"Creating a ChemInformatics Data System for Public Consumption"

Author:
Evan Bolton
NIH/NCBI

Abstract:
The processing pipeline used to create PubChem is outlined, discussed, and
demonstrated.

PubChem offers researchers public access to an array of structure and
activity information for a diverse set of small molecules.  It is
organized as three linked databases, Substance, Compound, and BioAssay,
within the Entrez/PubMed information retrieval system.  PubChem contains
the results of high-throughput biological screening experiments, and, when
possible, PubChem's records are linked to other NCBI databases, such as
the PubMed scientific literature database and NCBI's 3D protein structure
database.  

Validation and standardization of chemical structure data is critical to
PubChem, since it allows computation of properties, descriptors, and
similarity relationships among entries in a uniform and accurate way.
Within PubChem, OEChem is used for file I/O and molecular data handling,
SMARTS pattern matching, stereochemistry and aromaticity perception, and
valence bond canonicalization, among other things.  Ogham is being used to
assign proper IUPAC names to chemical structures and will also be used
generate structures for the number of cases where the dataset has only
names.  In the future, PubChem is planned to include properties predicted
using other software from OpenEye.

Creating_a_ChemInformatics_Data_System_for_Public_Consumption.pdf