

15.5 文献笔记¶

原文	The Elements of Statistical Learning
翻译	szcf-weiya
发布	2017-09-04
更新	2025-09-23

这里讨论的随机森林是 Breiman(2001)¹ 提出来的，经很多想法很早一前就以不同的形式出现了．值得一提的是，Ho(1995)²提出“random forest”的概念，并且用了在随机的特征子空间中增长树．采用随机排列和平均来避免过拟合是由 Kleinberg (1990)³ 提出来的，最后出现在 Kleinberg (1996)⁴ 中．Amit and Geman (1997)⁵ 采用在图象特征中增长随机树来处理图象分类问题．Breiman (1996a)⁶引出了bagging，这是他的随机森林的先驱．Dietterich (2000b)⁷也提出采用额外的随机化来提高bagging的性能．他的方法是在每个结点出对前20个候选分离排序，接着随机从中选择．他通过仿真和实际例子展示了额外的随机化能够提高 bagging 的性能．Friedman and Hall (2007)⁸ 证明了子采样（不放回）是 bagging 的一个有效的替代方案．他们证明在大小为 $N/2$ 的样本上生长和平均树是近似等于 bagging（考虑偏差及方差），而采用更少的样本则会降低更大的方差（通过去相关处理）．

有许多免费的软件实现随机森林．这章中，书中采用 R 中的 randomForest 包，由 Andy Liaw 维护，可以在 CRAN 网站上得到．这同时允许分割变量选择，以及子采样．Adele Cutler 维护一个随机森林的网站http://www.math.usu.edu/∼adele/forests/，其中由 Leo Breiman 和 Adele Cutler 编写的软件是免费的．他们的代码，以及名字“random forests”，是专门授权 Salford Systems 进行商业发行的．新西兰的 Waikato 大学的Weka机器学习存档http://www.cs.waikato.ac.nz/ml/weka/提供了随机森林 java 实现的免费版本．

Breiman, L. (2001). Random forests, Machine Learning 45: 5–32. ↩
Ho, T. K. (1995). Random decision forests, in M. Kavavaugh and P. Storms (eds), Proc. Third International Conference on Document Analysis and Recognition, Vol. 1, IEEE Computer Society Press, New York, pp. 278–282. ↩
Kleinberg, E. M. (1990). Stochastic discrimination, Annals of Mathematical Artificial Intelligence 1: 207–239. ↩
Kleinberg, E. M. (1996). An overtraining-resistant stochastic modeling method for pattern recognition, Annals of Statistics 24: 2319–2349. ↩
Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees, Neural Computation 9: 1545–1588. ↩
Breiman, L. (1996a). Bagging predictors, Machine Learning 26: 123–140. ↩
Dietterich, T. (2000b). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization, Machine Learning 40(2): 139–157. ↩
Friedman, J. and Hall, P. (2007). On bagging and nonlinear estimation, Journal of Statistical Planning and Inference 137: 669–683. ↩

15.5 文献笔记¶

💬 讨论区