

13.5 计算上的考虑¶

原文	The Elements of Statistical Learning
翻译	szcf-weiya
发布	2017-08-29
更新	2019-01-06
状态	Done

最近邻分类规则的一个缺点通常是它的计算负荷量，无论是寻找最近邻还是存储整个训练集合．当有 $N$ 个观测和 $p$ 个变量时，对每个点最近邻分类需要 $Np$ 次操作来寻找邻域．有一些快速算法 (Friedman et al., 1975¹; Friedman et al., 1977²) 能在某种程度上降低计算负荷．Hastie and Simard (1998)³ 通过在不变度量框架下提出类似 $K$-means 聚类的方法，降低了 切向距离 (tangent distance) 的计算量．

降低存储的要求更加困难，研究者们提出了各种 编辑 (editing) 和 压缩 (condensing) 的过程．想法是单独拿出能进行最近邻预测的训练集的子集，并且把剩下的训练数据丢掉．直观上看，保留靠近判别边界处和在边界正确一侧的训练点是很重要的，而那些离边界很远的点可以舍弃．

Devijver and Kittler (1982) ⁴ 的多重编辑算法将数据周期性地划分为训练集和测试集，计算测试集上的最近邻，并且删掉误分类的测试点．想法是保留训练观测值的同种类别．

Hart (1968)⁵ 的 压缩 (condensing) 过程更进一步，试图仅仅保留这些类别的 外点 (exterior point)．以随机选择单个观测作为训练集开始，每次处理一个额外的数据点，仅当它被当前训练集误分类时才加入到训练集中．

这是过程在 Dasarathy (1991)⁶ 和 Ripley (1996)⁷ 中均有综述．它们也可以应用到除最近邻以外的学习过程．尽管这些方法有时是有用的，但是我们并没有太多的实际经验，也没有在文献中找到关于它们表现的系统性的比较．

Friedman, J., Baskett, F. and Shustek, L. (1975). An algorithm for finding nearest neighbors, IEEE Transactions on Computers 24: 1000–1006. ↩
Friedman, J., Bentley, J. and Finkel, R. (1977). An algorthm for finding best matches in logarithmic expected time, ACM Transactions on Mathematical Software 3: 209–226. ↩
Hastie, T. and Simard, P. (1998). Models and metrics for handwritten digit recognition, Statistical Science 13: 54–65. ↩
Devijver, P. and Kittler, J. (1982). Pattern Recognition: A Statistical Approach, Prentice-Hall, Englewood Cliffs, N.J. ↩
Hart, P. (1968). The condensed nearest-neighbor rule, IEEE Transactions on Information Theory 14: 515–516. ↩
Dasarathy, B. (1991). Nearest Neighbor Pattern Classification Techniques, IEEE Computer Society Press, Los Alamitos, CA. ↩
Ripley, B. D. (1996). Pattern Recognition and Neural Networks, Cambridge University Press. ↩

13.5 计算上的考虑¶

💬 讨论区