Measuring molecular similarity and diversity plays an important role in various steps of the drug design cycle. Calculating molecule similarity is extensively used in applications such as virtual screening, property prediction, synthesis design and chemical database clustering.
Fingerprinting provides an elementary encoding of molecular graphs. Even though fingerprints can only represent local structural features and not their relative positions in molecules, it has proven to be very successful in a range of similarity and diversity studies.
GraphSim TK provides five different fingerprint types to perform 2D molecular similarity measurements:
A path fingerprint is generated by exhaustively enumerating all linear fragments of a molecular graph up to a given size and the hashing these fragments into a fixed-length bivector.
The tree fingerprint provides a novel approach that encodes molecular motifs that can not be captured by the Path or the Circular methods. This involves exhaustively enumerating all unique trees of a molecular graph.
The novelty of the GraphSim TK fingerprint generation is that it gives full control to the user to specify the size of the enumerated paths, along with the atom and bond properties that are used when path, circular or tree fragments are encoded into bitvectors.
The fingerprint methods where parameterized and validated using two published benchmarks:
Measuring molecular similarity or dissimilarity has two basic components:
GraphSim TK supports several built-in similarity coefficients (Cosine, Dice, Euclidean, Manhattan, Tanimoto, Tversky), while user-defined similarity measures are also available.
Apart from generating and storing fingerprints, GraphSim TK also provides a fingerprint database that is designed to perform rapid in-memory fingerprint search utilizing any of the built-in or user-defined similarity measures.
GraphSim TK also provides access to the common fragments found between two molecules based on the given fingerprint type.
This data can be used to visualize the similarity of two molecules based on their fingerprints. The fingerprint overlap, depicted in 2D, provides insight into molecule similarity beyond a single numerical score and reveals information about the underlying fingerprint method.
For more detailed information on GraphSim TK, check out the links below:
Enumerating path fragments
Enumerating circular fragments
Enumerating tree fragments
Depiction of 2D molecule similarity
The Cheminformatics suite of toolkits provides the core foundation upon which all of the OpenEye applications and remaining toolkits are built. The Cheminformatics suite is a collection of seven individual yet interdependent toolkits that are described in the table below.
The Modeling suite of toolkits provides the core functionality underlying OpenEye's defining principle that shape & electrostatics are the two fundamental descriptors determining intermolecular interactions. Many of the toolkits in the Modeling suite are directly associated with specific OpenEye applications and can therefore be used to create new or extend existing functionality associated with those applications.