To speed up the matching of individual atom and bond expressions, the terms and factors can be permuted such that the evaluation of an expression completes earlier. For AND operators, if the first operand is false, the result will be false without testing the second operand. Hence for AND expressions, the subexpression that is least likely to match should be ordered first. Similarly, for OR operators, if the first operand is true, the result will be true without having to test the second operand. Hence for OR expressions, the most likely subexpression should appear first.
SMACK's SMARTS optimizers base their ``a priori'' probabilities upon an analysis of the National Cancer Institute's August 2000 compound database, NCI00, which contains 250,251 molecules. The probabilities for SMARTS bond primitives are based upon the statistics in the table below:
| Expression | Description | Prob |
!@- |
Acyclic single bond | 35.20% |
!@= |
Acyclic double bond | 7.37% |
!@# |
Acyclic triple bond | 0.31% |
@- |
Cyclic single bond | 17.84% |
@= |
Cyclic double bond | 2.21% |
@# |
Cyclic triple bond | 0.00% |
: |
Aromatic bond | 37.07% |
From which the following probability values are derived:
| Expression | Description | Prob |
- |
Single bond | 53.04% |
= |
Double bond | 9.58% |
# |
Triple bond | 0.31% |
@ |
Cyclic (ring) bond | 57.12% |
!@ |
Acyclic bond | 42.88% |
The probabilities for SMARTS atom primitives are based upon the statistics in the table below:
| Expression | Description | Prob |
#5 |
Boron atom | 0.03% |
#6 |
Carbon atom | 73.04% |
#7 |
Nitrogen atom | 9.07% |
#8 |
Oxygen atom | 13.51% |
#9 |
Fluorine atom | 0.65% |
#11 |
Sodium atom | 0.01% |
#14 |
Silicon atom | 0.04% |
#15 |
Phosphorus atom | 0.20% |
#16 |
Sulfur atom | 1.45% |
#17 |
Chlorine atom | 1.38% |
#26 |
Iron atom | 0.02% |
#27 |
Cobalt atom | 0.02% |
#28 |
Nickel atom | 0.01% |
#29 |
Copper atom | 0.02% |
#30 |
Zinc atom | 0.01% |
#32 |
Germanium atom | 0.01% |
#33 |
Arsenic atom | 0.02% |
#34 |
Selenium atom | 0.01% |
#35 |
Bromine atom | 0.30% |
#46 |
Palladium atom | 0.01% |
#50 |
Tin atom | 0.03% |
#53 |
Iodine atom | 0.06% |
#74 |
Tungsten atom | 0.11% |
#78 |
Platinum atom | 0.03% |
#80 |
Mercury atom | 0.01% |
| Expression | Description | Prob |
* |
Any atom (wildcard) | 100.00% |
!* |
No atom | 0.00% |
a |
Aromatic atom | 38.91% |
A |
Aliphatic atom | 61.09% |
R |
Cyclic atom | 57.25% |
R0 |
Acyclic atom | 42.75% |
-1 |
Formal Charge = -1 | 0.72% |
+0 |
Formal Charge = 0 | 98.15% |
+1 |
Formal Charge = +1 | 1.09% |
+2 |
Formal Charge = +2 | 0.02% |
+3 |
Formal Charge = +3 | 0.01% |
H0 |
Total HCount = 0 | 43.78% |
H1 |
Total HCount = 1 | 34.56% |
H2 |
Total HCount = 2 | 14.23% |
H3 |
Total HCount = 3 | 7.42% |
h0 |
Implicit HCount = 0 | 43.78% |
h1 |
Implicit HCount = 1 | 34.56% |
h2 |
Implicit HCount = 2 | 14.23% |
h3 |
Implicit HCount = 3 | 7.42% |
X0 |
Total Bonds = 0 | 0.02% |
X1 |
Total Bonds = 1 | 9.76% |
X2 |
Total Bonds = 2 | 10.51% |
X3 |
Total Bonds = 3 | 52.08% |
X4 |
Total Bonds = 4 | 27.49% |
X5 |
Total Bonds = 5 | 0.05% |
X6 |
Total Bonds = 6 | 0.06% |
D0 |
Explicit Bonds = 0 | 0.07% |
D1 |
Explicit Bonds = 1 | 20.83% |
D2 |
Explicit Bonds = 2 | 48.28% |
D3 |
Explicit Bonds = 3 | 28.38% |
D4 |
Explicit Bonds = 4 | 2.33% |
D5 |
Explicit Bonds = 5 | 0.02% |
D6 |
Explicit Bonds = 6 | 0.06% |
v0 |
Valence = 0 | 0.02% |
v1 |
Valence = 1 | 2.97% |
v2 |
Valence = 2 | 13.78% |
v3 |
Valence = 3 | 8.43% |
v4 |
Valence = 4 | 74.04% |
v5 |
Valence = 5 | 0.22% |
v6 |
Valence = 6 | 0.50% |