Subsections

 
4.3 Scoring

The final stage of docking a ligand is to score the pose or poses generated by the docking steps described above.

4.3.1 Overview

The following structure-based scoring functions are available in FRED. These scoring functions also have MASC variant, see section 4.3.12.

Shapegauss
A shape-based scoring function that uses smooth Gaussian functions to represent the shapes of molecules. Details of this scoring function can be found in reference [4].

PLP
or Piecewise Linear Potential which is described in detail in reference [7].

Chemgauss2
Version 2 of the Chemgauss scoring function, which uses smooth Gaussian functions to represent the shape and chemistry of molecules. Chemgauss2 has been superseded by the newer Chemgauss3 in the new version of FRED.

Chemgauss3
Version 3 of the Chemgauss scoring function, which uses smooth Gaussian functions to represent the shape and chemistry of molecules.

Chemscore
which is described in reference [8]. This implementation of Chemscore adheres as faithfully as possible to the referenced paper.

OEChemscore
An OpenEye variant of Chemscore which is similar to the Chemscore implementation found in the original 1.2.x version of FRED.

Screenscore
which is described in reference [9].

Zapbind
A scoring function which uses PB electrostatic calculations in combination with an area contact term.

The following ligand-based scoring function are also available in FRED. Use of these scoring functions requires that the receptor file contain a bound ligand.

CGO
or Chemical Gaussian Overlay. This scoring function represents molecular shape and chemistry with smooth Gaussian function, as the Chemgauss functions do. However this function scores a pose by measuring how well it overlays onto the bound ligand, rather than scoring against the protein structure.

CGT
of Chemical Gaussian Tanimoto. Similar to CGO, but calculates a Tanimoto overlay rather than a volume overlay.

The following table gives a quick overview of the interactions the various scoring functions in FRED are aware of. It is easily seen that none of the functions have intramolecular terms. This is because FRED relies on the conformer generator that provides its input conformer database to ensure that all the input conformers have reasonable geometries and do not have significant intramolecular contacts and/or high strain energies.


Table 4.1: Summary of the interactions scoring functions are aware of
  Shape Hydrogen Bonds Metal Aromatic Desolvation
Shapegauss Yes No No No No
PLP Yes Yes Yes No No
Chemgauss2 Yes Yes Yes Yes No
Chemgauss3 Yes Yes Yes No Yes
Chemscore Yes Yes Yes No No
OEChemscore Yes Yes Yes No No
Screenscore Yes Yes Yes No No
Zapbind Yes No No No Yes
CGO Yes Yes Yes No No
CGT Yes Yes Yes No No


The scoring functions also vary in their speed, and only the faster functions can be used in the Exhaustive Docking and Optimization stages, as detailed in the following table. All the scoring functions can be used to give the final score for a pose.


Table 4.2: Summary of speed and places scoring function can be used
  Speed Exhaustive Docking Optimization
Shapegauss Very Fast Yes Yes
PLP Very Fast Yes Yes
Chemgauss2 Fast Yes Yes
Chemgauss3 Fast Yes Yes
Chemscore Medium No Yes
OEChemscore Medium No Yes
Screenscore Medium No Yes
Zapbind Slow No No
CGO Fast Yes Yes
CGT Medium No Yes


 
4.3.2 Shapegauss

4.3.2.1 Reference

This scoring functions is described in detail in Ref [4].

4.3.2.2 Typing

All heavy atoms are typed as steric atoms. Hydrogen atoms are ignored.

4.3.2.3 Components

There is only once component of Shapegauss, the shape score.

4.3.2.4 Description

The Shapegauss scoring function represents all atoms as smooth Gaussian functions. A pairwise potential between ligand and protein atoms is applied that attempts to maximize their surface contact and minimize their volume overlap (i.e., The potential is most favorable when the atoms are touching but not overlapping. A correction term is then applied to further penalize atoms which significantly overlap the protein.

 
4.3.3 PLP

4.3.3.1 Reference

The Piecewise Linear Potential is an implementation of the scoring function described in Ref [7].

4.3.3.2 Typing

The following atom types are recognized by PLP.

Donor
Hydrogen bond donors - primary and secondary amines.

Acceptor
Hydrogen bond acceptors - oxygen and nitrogen atoms with no bound hydrogens.

Hydroxyl
Hydroxyl groups are treated as both acceptors and donors.

NonPolar
Carbon, Chlorine, Fluorine, Bromine, Iodine and Nitrogen or sulfur with more than two attached hydrogens.

sulfur
Sulfurs with less than two attached hydrogen atoms.

Metal
Iron, Magnesium or Zinc.

4.3.3.3 Components

The total PLP score is a sum of the following components.

NonPolar
Interactions of all ligand non-polar atoms.

Hydrogen Bond
Interactions of all ligand acceptors and donors.

sulfur
Interactions of all ligand sulpurs.

Metal
Interactions of all ligand metals.

4.3.3.4 Description

PLP is a heavy atom scoring function, meaning all potentials are based on distances from heavy atom centers (i.e. hydrogen position is irrelevant, although the presence or absence of hydrogen is not, as it can affect the atom typing). The PLP implementation in FRED adheres to the reference as faithfully as possible, with the caveat that the implementation in FRED has been extending to include favorable interactions between acceptor and metal atoms.

 
4.3.4 Chemgauss2

4.3.4.1 Overview

The Chemgauss2 has been deprecated and will likely be removed in the next minor (not bugfix) release of FRED. Users are encouraged to use the new Chemgauss3 scoring function in place of Chemgauss2.

4.3.4.2 Typing

The following atom types are recognized by Chemgauss2.

Strong Hydrogen Bond Acceptors
are defined as any of the following

  1. Oxygen with two single bonded heavy atoms.

  2. Oxygens double bonded to a carbon.

  3. Oxygens single bonded to a carbon that are part of a carboxylic acid group.

  4. Triple bonded nitrogens without an attached hydrogen.

  5. Non-aromatic nitrogens with no attached hydrogens and two single bonds one of which is to a carbon and another to carbon, sulfur or nitrogen.

  6. Nitrogens with one single bond to another heavy atom and with 1 or 2 attached hydrogens.

  7. Oxygens double bonded to phosphorus.

  8. Oxygens single bonded to a metal.

Weak Hydrogen Bond Acceptors
are defined as any of the following:

  1. Aromatic nitrogens with no attached hydrogens.

  2. Oxygens single bonded to a carbon that are not part of a carboxylic acid group.

  3. Oxygens double bonded to sulfur.

  4. Sulfurs single bonded to a carbon.

  5. A sulfur with two single bonded carbons and no attached hydrogens.

Strong Hydrogen Bond Donors
are defined as any of the following:

  1. Non-aromatic nitrogens with two single bonds to heavy atoms and one attached hydrogen.

  2. Nitrogens with two attached hydrogens and one single bond to a carbon.

Weak Hydrogen Bond Donors
are defined as any of the following:

  1. Aromatic nitrogens with 1 attached hydrogen.

  2. Non-aromatic nitrogens with 3 single bonds to carbons and one attached hydrogen.

  3. Oxygens with an attached hydrogen and a single bond to a carbon.

Aromatic
Heavy atoms in an aromatic ring

Metal
Any heavy atom except He, B, C, N, O, F, Ne, Si, P, S, Cl, Ar, As, Se, Br, Kr, Te, I, Xe, At, and Rn.

Small Non Polar
fluorines, oxygens and nitrogens that are not also hydrogen bonding atoms.

Large Non Polar
Iodines, sulfurs and Phosphorous that are not also hydrogen bonding atoms.

Medium Non Polar
Any heavy atom that does not fit one of the above types.

In addition to these atom centered types, Chemgauss also determines the following positions around a molecule:

Polar Hydrogens
One or more possible positions for a hydrogen involved in a hydrogen bond. Note that this can be different from the explicit position of the hydrogen atom.

Lone Pairs
One or more possible positions around a hydrogen bond acceptor for a polar hydrogen from a donor.

PI electon positions
Pi electron positions of an aromatic atom above and below the plane of the aromatic ring.

4.3.4.3 Components

The Chemgauss2 function is the sum of the following potentials, all of which are based on smooth Gaussian functions:

Shape
Shape based interactions between all heavy atoms.

Hydrogen Bond
Hydrogen bonding interactions based on favorable interactions between polar hydrogens and lone pairs and a mild repulsion between donor heavy atoms and acceptor heavy atoms (which tends to make the hydrogen bonds linear).

Aromatic
Aromatic ring interactions based on favorable interactions between aromatic atoms and the pi-electron positions plus repulsive aromatic atom to aromatic atom and pi-electron to pi-electron interactions.

4.3.4.4 Description

The Chemgauss scoring function combines the Shapegauss scoring function with additional potentials between chemically matched positions around the ligand pose. These chemically complementary positions are generally not also atom positions, but rather are placed near specific functional groups. For instance acceptors have "lone pair" positions around them which denote positions where a polar hydrogen could be placed to create a hydrogen bonding interaction. Similarly donors have "polar hydrogen" positions, which denote positions its hydrogen could be in. For simple donors without rotatable bonds these positions correspond to the actual polar hydrogen position, but for rotatable bonds such as hydroxyls there are several positions representing the ring of possible positions for the polar hydrogen. A favorable hydrogen bond score is obtained when a "polar hydrogen" position on one molecule overlaps a "lone pair" position on another molecule.

 
4.3.5 Chemgauss3

4.3.5.1 Typing

Chemgauss3 recognizes the following heavy atom types (a single atom may have multiple types):

Steric
All heavy atoms are typed as steric.

Acceptor
Acceptors are classified as strong, moderate or weak in strength as follows:

Strong
Phosphate, NOxide, Carboxylate, Het6N, Phosphinyl and Oxyanion.

Moderate
Water, Sulfoxide, Primary Amine, Het5N, Thiocarbonyl, Sulfate, Tertiary Amine, Amide, Carbamate and Urea.

Weak
Nitrile, Ketone, Ester, Nitro, Het5O, Imine, Phenol, Hydroxyl, Sulfone, Primary Aniline, Secondary Amine and Ether.

Donor
Donors are classified as strong, moderate or weak as follows:

Strong
Primary Amine NpH, Secondary Amine NpH, Tertiary Amine NpH, Amidine NpH, Guanidine NpH, Het5NH and AcidOH.

Moderate
Water, Primary Amide, AnilineNH, AmidineNH, Secondary Amide, Aniline NH2 and Hydroxyl.

Weak
Hydrazine NH, Imine NH, Phenyl OH, Primary Amine and Secondary Amine.

Coordinating Groups
Carboxylate, Oxanion, PyridineN, SulfonamideNAnion, AromNAnion, Thianion and Hydroxamate are considered coordinating groups.

Metals
Calcium, Magnesium and Zinc.

In addition to these heavy atom types, Chemgauss3 also recognizes positions around some of the heavy atoms. The positions are not required to be located at atom centers (they generally are not) and are used to represent the directionality of certain interactions. These positions are typed as follows.

Lone Pairs
are placed around acceptor heavy atoms and represent places where a polar hydrogen from a donor could be placed to form a hydrogen bond interaction with the acceptor.

Polar Hydrogen
positions are placed around donor heavy atoms and represent possible positions of the donors polar hydrogens. These position will often correspond to the position of the actual polar hydrogen attached to the donor heavy atom, however this is not required. For example in the case of a hydroxyl there are six polar hydrogen positions used to represent the ring of possible positions the attached hydrogen can be in.

Water Positions
are placed around both donor and acceptor atoms and represent positions where a solvent water can make a hydrogen bonding interaction with the donor or acceptor.

Chelator coord
positions are located around metal-binding atoms and represent the positions a metal could be placed to form a coordinating interaction.

Note that the presence or absence of hydrogen can affect how a heavy atom is typed. e.g., an oxygen with an attached hydrogen may be classified as a donor for instance, but it would not be classified as a donor if the hydrogen were removed. However, if the hydrogen is present its position is ignored (i.e. polar hydrogen positions will be created for donors, but the actual hydrogen's position is not used in that calculation.

4.3.5.2 Components

The final Chemgauss3 score is a sum of the following components:

Steric
This component is based on the number of protein heavy atoms that contact heavy atoms of the ligand, with a correction term. The base potential accounts for two effects, the first is the increase in VdW type interactions when the ligand docks to the active site. The second is protein desolvation energy from displacing water from the active site into solvent, ignoring any favorable hydrogen bonding interactions water could make with the site. The correction term of this component accounts for the favorable interaction waters can make with the site, by applying a penalty to lipophilic ligand heavy atoms placed near acceptors or donors of the active site. Note that in principal this desolvation penalty should be applied to any heavy atom of the ligand not just lipophilic ones, however in practice we have found this correction to be ineffective when applied to polar atoms.

Acceptor
component measures the interactions acceptors on the ligand are making with donors on the protein.

Donor
component measures the interactions donors on the ligand are making with acceptors on the protein.

Metal
component measures the interactions coordinating atoms on the ligand are making with metals in the active site.

Desolvation
component is a penalty assessed when donors and acceptors on the ligand are blocked from interacting with solvent waters by the active site.

4.3.5.3 Description

All Chemgauss3 scoring function interactions are created from a base function that is smoothed by convolution with a Gaussian function. The base function for the various interactions are described below.

Steric
is a combination of a clash step function and two hard sphere potentials (representing short and long range Van der Waals interactions). This setup is designed to roughly approximate a VdW potential.

Hydrogen bond
is a hard sphere function based on the distance between a donor "polar hydrogen position" and an acceptor "lone pair" position. The function has a constant favorable value if the distance is less than 1.0 Angstrom and zero otherwise.

Metal
is a hard sphere function based on the distance between a "chelator coordinate" of the ligand and a metal on the protein. The function has a constant favorable value if the distance is less than 1.0 Angstrom and zero otherwise.

Ligand Desolvation
is a step function based on the distance between a "water position" of the ligand and the active site surface. Any water position within the active site surface (i.e. that clashed with the protein) is assessed a constant penalty. This effective penalizes the ligand for breaking hydrogen bonds with solvent upon binding.

Protein Desolvation
is an estimation of the chemical potential of water within the active site. Areas where water can make multiple hydrogen bonds are more favorable than those where it can form fewer hydrogen bonds.

Aromatic
is a hard sphere function based on the distance between "ring negative" and "ring positive" positions. The function has a constant favorable value if the distance is less than 1.0 Angstrom and zero otherwise.

 
4.3.6 Chemscore

4.3.6.1 Reference

The Chemscore scoring function is an implementation of the scoring function described in Ref [8].

4.3.6.2 Typing

See reference [8].

4.3.6.3 Components

The Chemscore score is a sum of the following components:

Lipophilic
Interaction between lipophilic atoms.

Hydrogen Bond
Interactions between donors and acceptors.

Metal
Interactions between metals and acceptors (which are considered chelators for the purposes of this term).

Clash
Penalty for clashes between ligand and protein.

Frozen Rotatable Bond
Penalty for loss of entropy due rotatable bonds that can no longer rotate upon binding to the active site.

4.3.6.4 Description

When this scoring function is used the position of hydrogens involved in hydrogen bonding is optimized with respect to the hydrogen bond energy (this is done for both protein and ligand donors). However, no optimization of heavy atoms on either the protein or ligand is done.

 
4.3.7 OEChemscore

This scoring function is identical to the Chemscore scoring function (see section 4.3.6), except that it lacks the rotatable bond term and has slightly different atom typing rules.

 
4.3.8 Screenscore

4.3.8.1 Reference

The screenscore scoring function is an implementation of the scoring functions described in Ref [9].

4.3.8.2 Typing

See reference [9].

4.3.8.3 Components

The Screenscore score is a sum of the following components:

Lipophilic
Steric interactions of Lipophilic atoms on the ligand.

Ambiguous
Steric interaction of Ambiguous atoms on the ligand.

Clash
Penalty for clashes with the protein.

PLP
A steric contribution identical to the steric component of PLP.

Hydrogen Bond
Interactions between acceptors and donors.

Metal
Interactions between metals and acceptors (which are treated as coordinating atoms for the purposes of this term).

Aromatic
Interactions between phenyl groups and methyls, amides or hydrogens on aromatic rings.

Rotatable Bond
A penalty term proportional to the number of rotatable bonds the ligand has.

 
4.3.9 Zapbind

4.3.9.1 Reference

See reference [3] for information about the Zap PB method.

4.3.9.2 Typing

Zapbind use the charge, radius and position of all atoms in the system. It does not type them further as the other scoring functions do.

4.3.9.3 Components

The Zapbind function is a sum of the following components.

ZAP
Electrostatic binding energy from a ZAP (PB) calculation.

AREA
Burried area contribution term.

4.3.9.4 Description

Zapbind is a combination of a surface area contact term and an electrostatic interaction calculated using the Poisson-Boltzman solvent approximation. The surface area is calculated using a Gaussian-based method, while the PB energy is calculated using ZAP (see Ref [3]).

While not required, it is HIGHLY, recomended that refinement vs. the Merck Molecular Mechanics Force Field be done when using this scoring function (this is done by setting "-refine lig_mmff"). Electrostatic calculations are extremely sensitive to the exact position of the ligand, and thus require highly refined structures.

 
4.3.10 CGO

CGO (short for Chemical Gaussian Overlay) is a ligand based scoring functions. It measures a pose's fitness (i.e. scores it), by testing how well the pose overlays a known bound ligand, rather than how well it complements the active site.

4.3.10.1 Typing

CGO recognizes the following heavy atom types (note that a single atom is allowed to have multiple types):

Steric
All heavy atoms are typed as steric.

Acceptor
Acceptors are classified as strong, moderate or weak in strength as follows:

Strong
Phosphate, NOxide, Carboxylate, Het6N, Phosphinyl and Oxyanion.

Moderate
Water, Sulfoxide, Primary Amine, Het5N, Thio Carbonyl, Sulfate, Tertiary Amine, Amide, Carbamate and Urea.

Weak
Nitrile, Ketone, Ester, Nitro, Net5O, Imine, Phenol, Hydroxyl, Sulfone, Primary Aniline, Secondary Amine and Ether.

Donor
Donors are classified as strong, moderate or weak as follows:

Strong
Primary Amine NpH, Secondary Amine NpH, Tertiary Amine NpH, Amidine NpH, Guanidine NpH, Het5NH and AcidOH.

Moderate
Water, Primary Amide, AnilineNH, AmidineNH, Secondary Amide, Aniline NH2 and Hydroxyl.

Weak
Hydrazine NH, Imine NH, Phenyl OH, Primary Amine and Secondary Amine.

Coordinating Atoms
Carboxylate, Oxanion, PyridineN, SulfonamideNAnion, AromNAnion, Thianion and Hydroxamate are considered coordinating atoms.

Metals
Calcium, Magnesium and Zinc.

Aromatic
Heavy atoms in 5 and 6 member aromatic rings.

In addition to the heavy atom types, CGO also recognizes positions around some of the heavy atoms. The positions are not required to be located at atom centers (they generally are not) and are used to represent the directionality of certain interactions. These positions are typed as follows:

Lone Pairs
are placed around acceptor heavy atoms and represent places where a polar hydrogen from a donor could be placed to form a hydrogen bond interaction with the acceptor.

Polar Hydrogen
positions are placed around donor heavy atoms and represent possible positions of the donor's polar hydrogens. These positions will often correspond to the position of the actual polar hydrogen attached to the donor heavy atom, however this is not required. For example in the case of a hydroxyl there are six polar hydrogen positions used to represent the ring of possible positions the attached hydrogen can be in.

Coordinating
positions are located around metal-binding atoms and represent the positions in which a metal could be placed to form a coordinating interaction.

Ring positive
are placed roughly on the position of the hydrogen attached to the aromatic heavy atom, and represent areas of slight positive charge around an aromatic ring system.

Ring negative
two ring negative positions are placed 3 Angstroms above and below each aromatic ring center, and represent areas of slight negative charge around aromatic ring systems. Unlike other extended positions ring negative positions are associated with several heavy atoms (the aromatic ring) rather than just one.

Note that the presence or absence of hydrogen can affect how a heavy atom is typed. e.g. an oxygen with an attached hydrogen may be classified as a donor for instance, but it would not be classified as a donor if the hydrogen were removed. However, if the hydrogen is present its position is ignored (i.e. polar hydrogen positions will be created for donors, and the hydrogen's actual position is not used in that calculation).

4.3.10.2 Components

The CGO score is a sum of the following components:

Shape
measures the overlay between the shape of the pose and the shape of the bound ligand.

Acceptor
measures the overlay between "lone pair" positions on the pose and those on the bound ligand. Only "lone pair" on acceptors that are making a hydrogen bond interaction with the protein are considered in this calculation.

Donor
measures the overlay between "polar hydrogen" positions on the pose and those on the bound ligand. Only "polar hydrogen" positions on donors that are making a hydrogen bond interaction with the protein are considered in theis calculation.

Chelator
measures the overlay between the "coordinating" positions of the pose and those of the bound ligand. Only "coordinating" positions of chelators that are making a metal interactions with the protein are considered in this calculation.

Aromatic
measures the overlay between the "ring positive" and "ring negative" positions on the pose and those on the bound ligand. By default this term is disabled.

4.3.10.3 Description

CGO is essentially the ligand based design version of the Chemgauss3 scoring function. The overlay of any two positions is calculated with the following formula.


\begin{displaymath}
Overlay = \int{g1*g2}
\end{displaymath} (4.1)

Where g1 and g2 are Gaussian functions centered on positions one and two respectively.

 
4.3.11 CGT

CGT is identical to CGO (see section 4.3.10), except that it converts the overlay information into a Tanimoto similarity.

Note that the total CGT score is not a sum of the components, but rather the components are a measure of the similarity of the respective positions alone.

 
4.3.12 Multiple Active Site Correction

Multiple Active Site Correction, or MASC, is not a scoring function, but rather a method for correcting for systematic errors in any scoring function by comparing the score of each ligand in the target protein to the same ligand's score in several standard protein targets (see ref [11]). The MASC score correction is


\begin{displaymath}
Corrected = \frac{Uncorrected - Average}{Standard Deviation}
\end{displaymath} (4.2)

where uncorrected is the uncorrected score of the ligand in the current protein target, while the average and standard deviation are calculated for each ligand based upon its score in a set of reference protein targets. Qualitatively the MASC corrected score measures how much better the ligand is scoring against the target protein than it does against a typical protein.

This is a computationally expensive docking strategy to set up, due to the fact that the ligand must be docked into each of a set of standard protein targets (the original MASC publication used 12, see Appendix D for a listing). However dockings to the standard proteins can be done independent of any target protein for a given ligand dataset and the precalculated values stored in the ligand file. Once this is done the MASC method can be used on any protein target with a negligible decrease in docking speed.

There are MASC variants of each of the scoring functions in FRED, see section 4.3.