Theory

There is a long history to bioisosteric replacement (See [Chen-2003]). Most medicinal chemists are well versed in standard sets of bioisosteric fragments. Likewise, there is a long history of computational approaches to bioisosteres (See [Verloop-1987]and [Bartlett-1994]). There have been several attempts to examine sets of known active compounds to empirically identify bioisosteric fragments (See [Ujvary-2003]and [Sheridan-2002]). While this is an interesting exercise, it has two drawbacks. First, it can only identify bioisosteric fragment pairs that are already known. While these provide interesting study, they are often familiar to experienced medicinal chemists and modelers. Second, it identifies many incidental rather than meaningful fragment pairs. These result from the fact that simply because two molecules bind to the same site does not mean they differ only by bioisosteric replacement. For instance, chemists may analog a compound by subsituting an N-methyl group with an N-benzyl group in order to identify new binding pockets. However, simply because both of these compounds are bioactive does not mean that methyl and benzyl are bioisosteres (though they would be identified as such by some methods). While one may apply various heuristics, such as size, to avoid this problem, we hope to explore methods that are more robust.

An alternative approach has been to generate an algorithm that would predict whether two fragments are bioisosteres. Several groups including Bartlett [Bartlett-1994], Verloop [Verloop-1987] and Willet [Willett-2001] have developed methods in this area. Here we seek to capitalize on and extend the ideas developed by these workers.

Bioisosteric Searching

The software described here allows a user to enter a single query fragment and search a very large database of known molecular fragments in order to identify fragments that are similar. Each database fragment is compared to the query fragment in 3D with regard to shape, chemistry, electrostatics and geometric presentation of attachment groups. The very best fragments in each of four classes are saved as potential bioisosteres. Each of the four classes are scored with a different similarity measure as discussed below.

The similarity functions in BROOD have been validated with known bioisostereic pairs to assure that they behave as expected, but have been designed to be general enough to identify bioisosteric pairs that are non-obvious. Frequently, similar search algorithms give top ranks only to those fragments that are close analogs of the query. While it is important for a program to be able to identify these fragments, they often are not interesting. To avoid this dilemma, BROOD has a class devoted specifically to query analogs. This assures that they are correctly identified and also leaves the other classes full of novel, chemically interesting fragments that are similar to the query analog in ways other than simple graph properties.

Classes of Bioisosteres

We search for four types of bioisosteres:

  1. Those with the best overlap of shape, atom-types and attachment geometry
  2. Those with the best overlap of shape, electrostatics and attachment geometry
  3. Those with the best replication of the attachment geometry (ignoring chemical properties
  4. Those that are close analogs for the query fragment

Separate scoring functions, rankings and hitlists exist for each of these classes of bioisosteres. These hitlists are titled “color”, “elect”, “struc” and “queryAnalog” respectively. Each of these individual classes of bioisosteres may be of particular interest for a given molecular design application. The “color”, “struc” and “queryAnalog” hitlists are generated by default. The elec hitlist is significantly more expensive to calculate, so it is only generated upon request.

The “color” bioisostere hitlist is the most general and practical set of bioisosteres. These fragments are those deemed most similar to the query based on the sum of shape similarity and atom-type similarity. Here, the atom-type similarity includes an atom-type for the attachment points. This measure of bioisosteric similarity generates fragment pairs that closely represent the classic medicinal chemistry notion of bioisosteres. By design, the default “color” force-field treats all color interactions with equal weighting. However, we recognize that the attachment points in bioisosteric fragments are an exception to the typical atom-type interaction. Therefore, BROOD contains a control parameter for the coefficient that balances the atom-type and attachment-point scores. However, we have experimented with several values and believe the default value of 1.5 represents a good balance of the functional-group and attachment point atom-type importance.

The “elect” or electrostatic bioisostere hitlist is analogous to the “color” method in its utility. However, there are two critical differences in how the similarity of fragments are evaluated. The first difference is that rather than comparing atom-types, we compare the electrostatic potential projected into the area around the molecules. To generate the electrostatic potential, we carry out a Poisson-Boltzmann calculation (external dielectric=80) and generate a measure of electrostatic similarity using the product of the query fragment electrostatic field and the database fragment elecrostatic field. This field product is used to calculate an electrostatic overlap between two fragment conformers. The second difference is that there is no consideration of the attachment points in the electrostatic term, so an additional term taking the difference in attachment-point positions into account is added. This “elect” calculation is significantly more expensive than the “color” calculation. This has two significant implications. First, the “elect” function cannot be optimized and is carried out on the overlays generated by the “color” method. Second, even with the relative speed of these single-point calculations, the “elect” method is more time-consuming than any of the other methods and is thus optional.

The “struc” or structure bioisostere hitlist is very specialized. In some circumstances one may be interested only in the structural aspects of a fragment. In other words, does the fragment present it’s attachment points in the same way independent of shape, chemistry and electrostatics. In these cases, the “struc” hitlist will contain the fragments of interest to you. These will be the fragments that can present substituents in a similar manner to the query fragment.

The final bioisostere hitlist is the “queryAnalog” hitlist. It contains all the fragments that have similar graphs to the query. Here we define similar graphs to be all those fragments that have identical uncolored graphs or uncolored graphs that differ by only one or two atoms from the query. Each of these fragments is oriented and evaluated by the “color” method to allow appropriate visual comparison of the fragment to the query. While this hitlist often contains many fragments that are obvious replacements for a query that one may already have in mind, it may also include fragments with dramatically different chemistry from that seen in the query.

Perhaps the most common extra criteria used in selecting bioisosteric fragments is rigidity. For this reason, BROOD has a control parameter devoted specifically to this property. While BROOD does not carry out complex conformational analysis, it allows a user to only consider bioisosteric fragments that contain ring systems. In practice, this often produces small, rigid, ring-systems that quite nicely overlay with linear query fragments.

Database Generation

An essential feature of these fragment methods is generation of a database of potential fragments. While it may be tempting to generate fragments de novo, these approaches often generate unrealistic chemical fragments. Particularly in regards to a method that is related to a common medicinal chemistry technique, we feel it is important to propose known fragments.

The databases that comes with the BROOD program are derived from a collection of roughly 12 million commercially available compounds. The compounds are fragmented resulting in approximately 90 thousand simple or 600 thousand complex unique molecular fragments. The fragments are filtered to remove toxic, reactive, or extremely large fragments. Two multi-conformer databases of the remaining fragments are provided with the software for bioisostere searches.

Users may also provide their own fragment database for searching. These fragments should be prepared with at least one “attachment point”, represented by a single bond to a dummy atom or “R-group” in most file formats. This distribution includes a utility program to aid in generating multi-conformer databases with OMEGA from fragment files. This is carried out by converting the dummy atom to a methyl group, generating the conformer and then converting the methyl group back to a dummy atom. The fragments in the specified database should already have pre-generated conformer ensembles.

In contrast, the query molecule does not need to have a 3D structure. BROOD will automatically turn a 1D or 2D query fragment in any OpenEye supported file format (vida infra) and generate a low-energy MMFF structure to carry out the bioisostere search. If you want to supply your own query structure please consult the manual entry for the -fromCT flag.