2.3 Database generation

An essential feature of these fragment methods is generation of a database of potential fragments. While it may be tempting to generate fragments de novo, these approaches often generate unrealistic chemical fragments. Particularly in regards to a method that is related to a common medicinal chemistry technique, we feel it is important to propose known fragments.

The databases that comes with the Brood program are derived from a collection of roughly 12 million commercially available compounds. The the compounds are fragmented resulting in approximately 1 million unique molecular fragments. All of these fragments have 15 or fewer heavy atoms and 1-3 attachment points. The fragments are filtered to remove toxic, reactive, or chemically over-complex fragments.

Approximately 550,000 fragments remain. These fragments are ranked according to the frequency with which they appear in the original database of molecules (e.g. how common they are). Two multi-conformer databases of the remaining fragments are provided with the software for bioisostere searches. The first of these databases (frag.f50.oeb.gz) contains the roughly 30,000 most common fragments. This database includes all fragments that occurred in 50 or more of the original molecules. The next database (frag.f5.oeb.gz) contains the roughly 140,000 most common fragments and is a superset of the first database. This second database contains all of the fragments that occurred in 5 or more of the original molecules. A third database containing more than 500,000 fragments is available to customers upon request.

Users may also provide their own fragment database for searching. These fragments should be prepared with at least one ``attachment point'', represented by a single bond to a dummy atom or ``R-group'' in most file formats. This distribution includes a utility program named dbhelper to aid in generating multi-conformer databases with partial charges and conformers from fragment files. This is carried out by converting the dummy atoms to tagged methyl groups.

In contrast, the query molecule does not need to have a 3D structure. Brood will automatically turn a 1D or 2D query fragment in any OpenEye supported file format (vide infra) and generate a low-energy MMFF structure to carry out the bioisostere search. In order to properly utilize a 3D query structure please consult the manual entry for the -fromCT flag.