Subsections
4.3 Scoring
The final stage of docking a ligand is to score the pose or poses generated by the
docking steps described above.
The following structure-based scoring functions are available in FRED. These
scoring functions also have MASC variant, see section 4.3.12.
Shapegauss
- A shape-based scoring function that uses
smooth Gaussian functions to represent the shapes of molecules. Details
of this scoring function can be found in reference [4].
PLP
- or Piecewise Linear Potential which is described in
detail in reference [7].
Chemgauss2
- Version 2 of the Chemgauss scoring function,
which uses smooth Gaussian functions to represent the shape and chemistry
of molecules. Chemgauss2 has been superseded by the newer Chemgauss3
in the new version of FRED.
Chemgauss3
- Version 3 of the Chemgauss scoring function,
which uses smooth Gaussian functions to represent the shape and chemistry
of molecules.
Chemscore
- which is described in reference [8].
This implementation of Chemscore adheres as faithfully as possible to the
referenced paper.
OEChemscore
- An OpenEye variant of Chemscore which is similar
to the Chemscore implementation found in the original 1.2.x version of
FRED.
Screenscore
- which is described in reference [9].
Zapbind
- A scoring function which uses PB electrostatic
calculations in combination with an area contact term.
The following ligand-based scoring function are also available in FRED. Use
of these scoring functions requires that the receptor file contain a
bound ligand.
CGO
- or Chemical Gaussian Overlay. This scoring function
represents molecular shape and chemistry with smooth Gaussian function,
as the Chemgauss functions do. However this function scores a pose
by measuring how well it overlays onto the bound ligand, rather than
scoring against the protein structure.
CGT
- of Chemical Gaussian Tanimoto. Similar to CGO, but
calculates a Tanimoto overlay rather than a volume overlay.
The following table gives a quick overview of the interactions the
various scoring functions in FRED are aware of. It is easily seen that
none of the functions have intramolecular terms. This is because FRED relies
on the conformer generator that provides its input conformer database to ensure
that all the input conformers have reasonable geometries and do not have
significant intramolecular contacts and/or high strain energies.
Table 4.1:
Summary of the interactions scoring functions are aware of
| |
Shape |
Hydrogen Bonds |
Metal |
Aromatic |
Desolvation |
| Shapegauss |
Yes |
No |
No |
No |
No |
| PLP |
Yes |
Yes |
Yes |
No |
No |
| Chemgauss2 |
Yes |
Yes |
Yes |
Yes |
No |
| Chemgauss3 |
Yes |
Yes |
Yes |
No |
Yes |
| Chemscore |
Yes |
Yes |
Yes |
No |
No |
| OEChemscore |
Yes |
Yes |
Yes |
No |
No |
| Screenscore |
Yes |
Yes |
Yes |
No |
No |
| Zapbind |
Yes |
No |
No |
No |
Yes |
| CGO |
Yes |
Yes |
Yes |
No |
No |
| CGT |
Yes |
Yes |
Yes |
No |
No |
|
The scoring functions also vary in their speed, and only the faster
functions can be used in the Exhaustive Docking and Optimization stages,
as detailed in the following table. All the scoring functions can
be used to give the final score for a pose.
Table 4.2:
Summary of speed and places scoring function can be used
| |
Speed |
Exhaustive Docking |
Optimization |
| Shapegauss |
Very Fast |
Yes |
Yes |
| PLP |
Very Fast |
Yes |
Yes |
| Chemgauss2 |
Fast |
Yes |
Yes |
| Chemgauss3 |
Fast |
Yes |
Yes |
| Chemscore |
Medium |
No |
Yes |
| OEChemscore |
Medium |
No |
Yes |
| Screenscore |
Medium |
No |
Yes |
| Zapbind |
Slow |
No |
No |
| CGO |
Fast |
Yes |
Yes |
| CGT |
Medium |
No |
Yes |
|
4.3.2 Shapegauss
This scoring functions is described in detail in Ref [4].
All heavy atoms are typed as steric atoms. Hydrogen atoms are
ignored.
There is only once component of Shapegauss, the shape score.
The Shapegauss scoring function represents all atoms as smooth Gaussian
functions. A pairwise potential between ligand and protein atoms is applied
that attempts to maximize their surface contact and minimize their volume
overlap (i.e., The potential is most favorable when the atoms are touching
but not overlapping. A correction term is then applied to further penalize
atoms which significantly overlap the protein.
4.3.3 PLP
The Piecewise Linear Potential is an implementation of the scoring function
described in Ref [7].
The following atom types are recognized by PLP.
Donor
- Hydrogen bond donors - primary and secondary amines.
Acceptor
- Hydrogen bond acceptors - oxygen and nitrogen atoms with
no bound hydrogens.
Hydroxyl
- Hydroxyl groups are treated as both acceptors and donors.
NonPolar
- Carbon, Chlorine, Fluorine, Bromine, Iodine and
Nitrogen or sulfur with more than two attached hydrogens.
sulfur
- Sulfurs with less than two attached hydrogen atoms.
Metal
- Iron, Magnesium or Zinc.
The total PLP score is a sum of the following components.
NonPolar
- Interactions of all ligand non-polar atoms.
Hydrogen Bond
- Interactions of all ligand acceptors and donors.
sulfur
- Interactions of all ligand sulpurs.
Metal
- Interactions of all ligand metals.
PLP is a heavy atom scoring function, meaning all potentials are based on
distances from heavy atom centers (i.e. hydrogen position is irrelevant,
although the presence or absence of hydrogen is not, as it can affect the
atom typing). The PLP implementation in FRED adheres to the reference as faithfully as
possible, with the caveat that the implementation in FRED has been extending to
include favorable interactions between acceptor and metal atoms.
4.3.4 Chemgauss2
The Chemgauss2 has been deprecated and will likely be removed in the next
minor (not bugfix) release of FRED. Users are encouraged to use the new
Chemgauss3 scoring function in place of Chemgauss2.
The following atom types are recognized by Chemgauss2.
Strong Hydrogen Bond Acceptors
- are defined as any of the following
- Oxygen with two single bonded heavy atoms.
- Oxygens double bonded to a carbon.
- Oxygens single bonded to a carbon that are part of a carboxylic acid group.
- Triple bonded nitrogens without an attached hydrogen.
- Non-aromatic nitrogens with no attached hydrogens and two single bonds
one of which is to a carbon and another to carbon, sulfur or nitrogen.
- Nitrogens with one single bond to another heavy atom and with 1 or 2
attached hydrogens.
- Oxygens double bonded to phosphorus.
- Oxygens single bonded to a metal.
Weak Hydrogen Bond Acceptors
- are defined as any of the following:
- Aromatic nitrogens with no attached hydrogens.
- Oxygens single bonded to a carbon that are not part of a carboxylic acid group.
- Oxygens double bonded to sulfur.
- Sulfurs single bonded to a carbon.
- A sulfur with two single bonded carbons and no attached hydrogens.
Strong Hydrogen Bond Donors
- are defined as any of the following:
- Non-aromatic nitrogens with two single bonds to heavy atoms and one
attached hydrogen.
- Nitrogens with two attached hydrogens and one single bond to a carbon.
Weak Hydrogen Bond Donors
- are defined as any of the following:
- Aromatic nitrogens with 1 attached hydrogen.
- Non-aromatic nitrogens with 3 single bonds to carbons and one
attached hydrogen.
- Oxygens with an attached hydrogen and a single bond to a carbon.
Aromatic
- Heavy atoms in an aromatic ring
Metal
- Any heavy atom except He, B, C, N, O, F, Ne, Si, P,
S, Cl, Ar, As, Se, Br, Kr, Te, I, Xe, At, and Rn.
Small Non Polar
- fluorines, oxygens and nitrogens that are
not also hydrogen bonding atoms.
Large Non Polar
- Iodines, sulfurs and Phosphorous that are
not also hydrogen bonding atoms.
Medium Non Polar
- Any heavy atom that does not fit one of the
above types.
In addition to these atom centered types, Chemgauss also determines the
following positions around a molecule:
Polar Hydrogens
- One or more possible positions for a hydrogen
involved in a hydrogen bond. Note that this can be different from the
explicit position of the hydrogen atom.
Lone Pairs
- One or more possible positions around a hydrogen
bond acceptor for a polar hydrogen from a donor.
PI electon positions
- Pi electron positions of an aromatic
atom above and below the plane of the aromatic ring.
The Chemgauss2 function is the sum of the following potentials, all of
which are based on smooth Gaussian functions:
Shape
- Shape based interactions between all heavy atoms.
Hydrogen Bond
- Hydrogen bonding interactions based on favorable
interactions between polar hydrogens and lone pairs and a mild repulsion between
donor heavy atoms and acceptor heavy atoms (which tends to make the hydrogen bonds linear).
Aromatic
- Aromatic ring interactions based on favorable interactions
between aromatic atoms and the pi-electron positions plus repulsive aromatic atom
to aromatic atom and pi-electron to pi-electron interactions.
The Chemgauss scoring function combines the Shapegauss scoring function
with additional potentials between chemically matched positions around the
ligand pose. These chemically complementary positions are generally not
also atom positions, but rather are placed near specific functional groups.
For instance acceptors have "lone pair" positions around them which denote
positions where a polar hydrogen could be placed to create a hydrogen bonding
interaction. Similarly donors have "polar hydrogen" positions, which denote
positions its hydrogen could be in. For simple donors without rotatable bonds
these positions correspond to the actual polar hydrogen position, but for
rotatable bonds such as hydroxyls there are several positions representing
the ring of possible positions for the polar hydrogen. A favorable hydrogen
bond score is obtained when a "polar hydrogen" position on one molecule
overlaps a "lone pair" position on another molecule.
4.3.5 Chemgauss3
Chemgauss3 recognizes the following heavy atom types (a single atom
may have multiple types):
Steric
- All heavy atoms are typed as steric.
Acceptor
- Acceptors are classified as strong, moderate or weak
in strength as follows:
Strong
- Phosphate, NOxide, Carboxylate, Het6N, Phosphinyl and Oxyanion.
Moderate
- Water, Sulfoxide, Primary Amine, Het5N, Thiocarbonyl, Sulfate, Tertiary Amine, Amide, Carbamate and Urea.
Weak
- Nitrile, Ketone, Ester, Nitro, Het5O, Imine, Phenol, Hydroxyl, Sulfone, Primary Aniline, Secondary Amine and Ether.
Donor
- Donors are classified as strong, moderate or weak as follows:
Strong
- Primary Amine NpH, Secondary Amine NpH, Tertiary Amine NpH, Amidine NpH, Guanidine NpH, Het5NH and AcidOH.
Moderate
- Water, Primary Amide, AnilineNH, AmidineNH, Secondary Amide, Aniline NH2 and Hydroxyl.
Weak
- Hydrazine NH, Imine NH, Phenyl OH, Primary Amine and Secondary Amine.
Coordinating Groups
- Carboxylate, Oxanion, PyridineN, SulfonamideNAnion, AromNAnion, Thianion and Hydroxamate are considered coordinating groups.
Metals
- Calcium, Magnesium and Zinc.
In addition to these heavy atom types, Chemgauss3 also recognizes positions
around some of the heavy atoms. The positions are not required to be located at
atom centers (they generally are not) and are used to represent the
directionality of certain interactions. These positions are typed as follows.
Lone Pairs
- are placed around acceptor heavy atoms and represent
places where a polar hydrogen from a donor could be placed to form a hydrogen
bond interaction with the acceptor.
Polar Hydrogen
- positions are placed around donor heavy atoms and
represent possible positions of the donors polar hydrogens. These position will
often correspond to the position of the actual polar hydrogen attached to the
donor heavy atom, however this is not required. For example in the case of a
hydroxyl there are six polar hydrogen positions used to represent the ring of
possible positions the attached hydrogen can be in.
Water Positions
- are placed around both donor and acceptor atoms
and represent positions where a solvent water can make a hydrogen bonding
interaction with the donor or acceptor.
Chelator coord
- positions are located around metal-binding atoms and
represent the positions a metal could be placed to form a coordinating interaction.
Note that the presence or absence of hydrogen can affect how a heavy atom is
typed. e.g., an oxygen with an attached hydrogen may be classified as a donor
for instance, but it would not be classified as a donor if the hydrogen were
removed. However, if the hydrogen is present its position is ignored (i.e.
polar hydrogen positions will be created for donors, but the actual hydrogen's
position is not used in that calculation.
The final Chemgauss3 score is a sum of the following components:
Steric
- This component is based on the number of protein heavy
atoms that contact heavy atoms of the ligand, with a correction term. The
base potential accounts for two effects, the first is the increase in VdW
type interactions when the ligand docks to the active site. The second is
protein desolvation energy from displacing water from the active site into
solvent,
ignoring any favorable hydrogen bonding interactions water could make with the site.
The correction term of this component accounts for the favorable interaction
waters can make with the site, by applying a penalty to lipophilic ligand
heavy atoms placed near acceptors or donors of the active site. Note that
in principal this desolvation penalty should be applied to any heavy atom
of the ligand not just lipophilic ones, however in practice we have found
this correction to be ineffective when applied to polar atoms.
Acceptor
- component measures the interactions acceptors on the
ligand are making with donors on the protein.
Donor
- component measures the interactions donors on the
ligand are making with acceptors on the protein.
Metal
- component measures the interactions coordinating atoms on the ligand
are making with metals in the active site.
Desolvation
- component is a penalty assessed when donors and
acceptors on the ligand are blocked from interacting with solvent waters by the
active site.
All Chemgauss3 scoring function interactions are created from a base function
that is smoothed by convolution with a Gaussian function. The base function
for the various interactions are described below.
Steric
- is a combination of a clash step function and
two hard sphere potentials (representing short and long range Van der Waals
interactions). This setup is designed to roughly approximate a VdW potential.
Hydrogen bond
- is a hard sphere function based on the distance
between a donor "polar hydrogen position" and an acceptor "lone pair" position.
The function has a constant favorable value if the distance is less than 1.0
Angstrom and zero otherwise.
Metal
- is a hard sphere function based on the distance between a
"chelator coordinate" of the ligand and a metal on the protein. The function
has a constant favorable value if the distance is less than 1.0 Angstrom and
zero otherwise.
Ligand Desolvation
- is a step function based on the distance
between a "water position" of the ligand and the active site surface. Any
water position within the active site surface (i.e. that clashed with the
protein) is assessed a constant penalty. This effective penalizes the ligand
for breaking hydrogen bonds with solvent upon binding.
Protein Desolvation
- is an estimation of the chemical potential
of water within the active site. Areas where water can make multiple
hydrogen bonds are more favorable than those where it can form fewer
hydrogen bonds.
Aromatic
- is a hard sphere function based on the distance between
"ring negative" and "ring positive" positions. The function has a constant
favorable value if the distance is less than 1.0 Angstrom and zero otherwise.
4.3.6 Chemscore
The Chemscore scoring function is an implementation of the scoring function
described in Ref [8].
See reference [8].
The Chemscore score is a sum of the following components:
Lipophilic
- Interaction between lipophilic atoms.
Hydrogen Bond
- Interactions between donors and acceptors.
Metal
- Interactions between metals and acceptors (which
are considered chelators for the purposes of this term).
Clash
- Penalty for clashes between ligand and protein.
Frozen Rotatable Bond
- Penalty for loss of entropy due rotatable bonds that
can no longer rotate upon binding to the active site.
When this scoring function is used the position of hydrogens involved in
hydrogen bonding is optimized with respect to the hydrogen bond energy (this
is done for both protein and ligand donors). However, no optimization of heavy
atoms on either the protein or ligand is done.
4.3.7 OEChemscore
This scoring function is identical to the Chemscore scoring function (see
section 4.3.6), except that it lacks the rotatable bond term
and has slightly different atom typing rules.
4.3.8 Screenscore
The screenscore scoring function is an implementation of the scoring functions
described in Ref [9].
See reference [9].
The Screenscore score is a sum of the following components:
Lipophilic
- Steric interactions of Lipophilic atoms on the ligand.
Ambiguous
- Steric interaction of Ambiguous atoms on the ligand.
Clash
- Penalty for clashes with the protein.
PLP
- A steric contribution identical to the steric component of
PLP.
Hydrogen Bond
- Interactions between acceptors and donors.
Metal
- Interactions between metals and acceptors (which are treated
as coordinating atoms for the purposes of this term).
Aromatic
- Interactions between phenyl groups and methyls, amides
or hydrogens on aromatic rings.
Rotatable Bond
- A penalty term proportional to the number of
rotatable bonds the ligand has.
4.3.9 Zapbind
See reference [3] for information about the Zap PB method.
Zapbind use the charge, radius and position of all atoms in the system. It
does not type them further as the other scoring functions do.
The Zapbind function is a sum of the following components.
ZAP
- Electrostatic binding energy from a ZAP (PB) calculation.
AREA
- Burried area contribution term.
Zapbind is a combination of a surface area contact term and an electrostatic
interaction calculated using the Poisson-Boltzman solvent approximation.
The surface area is calculated using a Gaussian-based method, while the PB
energy is calculated using ZAP (see Ref [3]).
While not required, it is HIGHLY, recomended that refinement vs. the
Merck Molecular Mechanics Force Field be done when using this scoring
function (this is done by setting "-refine lig_mmff"). Electrostatic
calculations are extremely sensitive to the exact position of the ligand,
and thus require highly refined structures.
4.3.10 CGO
CGO (short for Chemical Gaussian Overlay) is a ligand based scoring functions.
It measures a pose's fitness (i.e. scores it), by testing how well the pose
overlays a known bound ligand, rather than how well it complements the active
site.
CGO recognizes the following heavy atom types (note that a single atom
is allowed to have multiple types):
Steric
- All heavy atoms are typed as steric.
Acceptor
- Acceptors are classified as strong, moderate or weak
in strength as follows:
Strong
- Phosphate, NOxide, Carboxylate, Het6N, Phosphinyl and Oxyanion.
Moderate
- Water, Sulfoxide, Primary Amine, Het5N, Thio Carbonyl, Sulfate, Tertiary Amine, Amide, Carbamate and Urea.
Weak
- Nitrile, Ketone, Ester, Nitro, Net5O, Imine, Phenol, Hydroxyl, Sulfone, Primary Aniline, Secondary Amine and Ether.
Donor
- Donors are classified as strong, moderate or weak as follows:
Strong
- Primary Amine NpH, Secondary Amine NpH, Tertiary Amine NpH, Amidine NpH, Guanidine NpH, Het5NH and AcidOH.
Moderate
- Water, Primary Amide, AnilineNH, AmidineNH, Secondary Amide, Aniline NH2 and Hydroxyl.
Weak
- Hydrazine NH, Imine NH, Phenyl OH, Primary Amine and Secondary Amine.
Coordinating Atoms
- Carboxylate, Oxanion, PyridineN, SulfonamideNAnion, AromNAnion, Thianion and Hydroxamate are considered coordinating atoms.
Metals
- Calcium, Magnesium and Zinc.
Aromatic
- Heavy atoms in 5 and 6 member aromatic rings.
In addition to the heavy atom types, CGO also recognizes positions
around some of the heavy atoms. The positions are not required to be located at
atom centers (they generally are not) and are used to represent the
directionality of certain interactions. These positions are typed as follows:
Lone Pairs
- are placed around acceptor heavy atoms and represent
places where a polar hydrogen from a donor could be placed to form a hydrogen
bond interaction with the acceptor.
Polar Hydrogen
- positions are placed around donor heavy atoms and
represent possible positions of the donor's polar hydrogens. These positions will
often correspond to the position of the actual polar hydrogen attached to the
donor heavy atom, however this is not required. For example in the case of a
hydroxyl there are six polar hydrogen positions used to represent the ring of
possible positions the attached hydrogen can be in.
Coordinating
- positions are located around metal-binding atoms and
represent the positions in which a metal could be placed to form a coordinating interaction.
Ring positive
- are placed roughly on the position of the hydrogen
attached to the aromatic heavy atom, and represent areas of slight positive charge
around an aromatic ring system.
Ring negative
- two ring negative positions are placed 3 Angstroms
above and below each aromatic ring center, and represent areas of slight
negative charge around aromatic ring systems. Unlike other extended positions
ring negative positions are associated with several heavy atoms (the aromatic
ring) rather than just one.
Note that the presence or absence of hydrogen can affect how a heavy atom is
typed. e.g. an oxygen with an attached hydrogen may be classified as a donor
for instance, but it would not be classified as a donor if the hydrogen were
removed. However, if the hydrogen is present its position is ignored (i.e.
polar hydrogen positions will be created for donors, and the hydrogen's actual
position is not used in that calculation).
The CGO score is a sum of the following components:
Shape
- measures the overlay between the shape of the pose and
the shape of the bound ligand.
Acceptor
- measures the overlay between "lone pair" positions on
the pose and those on the bound ligand. Only "lone pair" on acceptors that
are making a hydrogen bond interaction with the protein are considered in
this calculation.
Donor
- measures the overlay between "polar hydrogen" positions on
the pose and those on the bound ligand. Only "polar hydrogen" positions on
donors that are making a hydrogen bond interaction with the protein are
considered in theis calculation.
Chelator
- measures the overlay between the "coordinating"
positions of the pose and those of the bound ligand. Only "coordinating"
positions of chelators that are making a metal interactions with the protein
are considered in this calculation.
Aromatic
- measures the overlay between the "ring positive" and
"ring negative" positions on the pose and those on the bound ligand. By
default this term is disabled.
CGO is essentially the ligand based design version of the Chemgauss3 scoring
function. The overlay of any two positions is calculated with the following
formula.
 |
(4.1) |
Where g1 and g2 are Gaussian functions centered on positions
one and two respectively.
4.3.11 CGT
CGT is identical to CGO (see section 4.3.10), except that it converts
the overlay information into a Tanimoto similarity.
Note that the total CGT score is not a sum of the components, but rather
the components are a measure of the similarity of the respective positions
alone.
4.3.12 Multiple Active Site Correction
Multiple Active Site Correction, or MASC, is not a scoring function, but rather
a method for correcting for systematic errors in any scoring function by
comparing the score of each ligand in the target protein to the same ligand's
score in several standard protein targets (see ref [11]). The MASC
score correction is
 |
(4.2) |
where uncorrected is the uncorrected score of the ligand in the current protein
target, while the average and standard deviation are calculated for each ligand
based upon its score in a set of reference protein targets. Qualitatively the
MASC corrected score measures how much better the ligand is scoring against the target
protein than it does against a typical protein.
This is a computationally expensive docking strategy to set up, due to the
fact that the ligand must be docked into each of a set of standard protein targets
(the original MASC publication used 12, see Appendix D for a listing).
However dockings to the standard proteins can be done independent of any target
protein for a given ligand dataset and the precalculated values stored in the
ligand file. Once this is done the MASC method can be used on any protein
target with a negligible decrease in docking speed.
There are MASC variants of each of the scoring functions in FRED, see
section 4.3.