Title: Towards Shape Clustering of PubChem Author: Fabien Fontaine, Evan Bolton, Yulia Borodina and Stephen H Bryant National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA. Abstract: PubChem is an open access database containing more than 5.3 million unique chemical structures maintained at the National Center for Biotechnology Information (NCBI). Currently, the compounds in PubChem are represented using 2D information; however, we may include 3D conformer models for a subset of the chemical structures. A key motivation for adding 3D information in PubChem is to utilize OpenEye shape technology for shape neighbor clustering and shape similarity querying. The huge size of the PubChem data set prevents effective real-time shape-comparison of billions of 3D conformers by means of a ROCS direct Gaussian overlay. A recently described shape-fingerprint methodology (Haigh, Pickup et al. 2005) has appeared as an attractive way to make shape similarity neighboring studies more tractable. We will outline extensions to shape-fingerprint methodology, with comparisons to the more conventional direct Gaussian overlays, and its potential application in PubChem shape similarity neighboring analyses. ref: Haigh, J. A., B. T. Pickup, et al. (2005). "Small Molecule Shape-Fingerprints." J. Chem. Inf. Model. 45(3): 673-684.