A New Look at Clustering Large Datasets

BOOST: Performance

Number of Comparisons
v. Original Dataset Size
Dataset Size Number of Comparisons
2,000 7.38E+05
5,000 2.32E+06
10,000 5.00E+06
40,000 2.93E+07
70,000 4.19E+07
92,087 5.07E+07
183,912 8.58E+07
275,838 1.28E+08
(Best fit: Linear)
 
Percent Remaining After BOOST
v. Original Dataset Size
Dataset Size Reduced To As Percent of Original
2,000 1,086 54.30%
5,000 2,707 54.14%
10,000 5,458 54.58%
40,000 9,397 23.49%
70,000 11,639 16.63%
92,087 12,411 13.48%
183,912 17,705 9.63%
275,838 20,180 7.32%
(Best fit: Power decay)


| Prev | Contents | Next | Robin Hewitt (rhewitt@acm.org), Feb 2003