|
A New Look at Clustering Large Datasets
BOOST: Performance
Number of Comparisons v. Original Dataset Size
|
| Dataset Size |
Number of Comparisons |
| 2,000 | 7.38E+05 |
| 5,000 | 2.32E+06 |
| 10,000 | 5.00E+06 |
| 40,000 | 2.93E+07 |
| 70,000 | 4.19E+07 |
| 92,087 | 5.07E+07 |
| 183,912 | 8.58E+07 |
| 275,838 | 1.28E+08 |
|
|
(Best fit: Linear)
|
 |
|
|
|
| |
Percent Remaining After BOOST v. Original Dataset Size
|
| Dataset Size |
Reduced To |
As Percent of Original |
| 2,000 | 1,086 | 54.30% |
| 5,000 | 2,707 | 54.14% |
| 10,000 | 5,458 | 54.58% |
| 40,000 | 9,397 | 23.49% |
| 70,000 | 11,639 | 16.63% |
| 92,087 | 12,411 | 13.48% |
| 183,912 | 17,705 | 9.63% |
| 275,838 | 20,180 | 7.32% |
|
|
(Best fit: Power decay)
|
 |
|
|
|
|