Title:

Towards Shape Clustering of PubChem

Author:

Fabien Fontaine, Evan Bolton, Yulia Borodina and Stephen H Bryant

National Center for Biotechnology Information, National Library of
Medicine, National Institutes of Health, Department of Health and
Human Services, Bethesda, MD 20894, USA.


Abstract:

PubChem is an open access database containing more than 5.3 million
unique chemical structures maintained at the National Center for
Biotechnology Information (NCBI).  Currently, the compounds in
PubChem are represented using 2D information; however, we may
include 3D conformer models for a subset of the chemical structures.

A key motivation for adding 3D information in PubChem is to utilize
OpenEye shape technology for shape neighbor clustering and shape
similarity querying. The huge size of the PubChem data set prevents
effective real-time shape-comparison of billions of 3D conformers
by means of a ROCS direct Gaussian overlay. A recently described
shape-fingerprint methodology (Haigh, Pickup et al. 2005) has
appeared as an attractive way to make shape similarity neighboring
studies more tractable.

We will outline extensions to shape-fingerprint methodology, with
comparisons to the more conventional direct Gaussian overlays, and
its potential application in PubChem shape similarity neighboring
analyses.



ref: Haigh, J. A., B. T. Pickup, et al. (2005). "Small Molecule
Shape-Fingerprints." J.  Chem. Inf. Model. 45(3): 673-684.