1.16 Accumulation of rules

Filter contains numerous rules that judge the quality of molecules on many different facets. When examined individually, each of these rules seems quite reasonable and even profitable. However, when each molecule is tested against hundreds of individual filters, the fraction of molecules that pass all the filters can be surprisingly small. Sometimes less than 50% of vendor databases pass the filters. If this is unacceptable we recommend you examine the predicted aggregator, solubility and Veber filters. In our experience, these are the most common failures. The best method for examining this dilemma is to use a tab-separated table file (use table parameter) with failure value flagged with an asterix (tableflag parameter) and also to examine the log file. The combination of these two methods allows quick adjustment of the filter file to generate the desired database size and properties.

To demonstrate this principle in a tangible way, we carried out filtering on 141 of the best-selling non-antibiotic prescription drugs from 2005. We designed two filters. The first filter adjusts each value so that it just spans the range of the properties for the 144 compounds, and thus this filter allows all 141 compounds to pass. The second filter is very similar to the first. However, for each value, rather than spanning the entire range, it's properties are set to cover from the 2.5th percentile to the 97.5th percentile. The differences between these two filters are often both in reasonable ranges. For instance, the full range of molecular weight spans 130 to 781, while the 2.5th percentile is 145 and the 97.5th percentile is 570. However, the remarkable result is that when the later filter is used, only 75 of the 141 molecules pass the filter! This demonstrates how slight changes to many filters can leads to a significant reduction in the number of compounds that pass all of the filters. Both these example filters are available in the data directory of the distribution (filter_blockbuster.txt and filter_2.5_blockbuster.txt).