Bagging 与随机森林
周五午盘,一家 50 亿规模的 CN 私募把一份沪深300 alpha 数据甩到你工位:30 个特征、日频次日超额收益作标签。上一课那棵深度 15 的 CART 树样本内方向准确率 100%、样本外只有 51%——比抛硬币好不了多少,Sharpe 几乎为零。你把它换成 500 棵在 bootstrap 样本上独立训练的深树取平均,样本外跳到 57%。这一跳,...
打开 →GLOBAL SEARCH
搜索在服务端完成,题目解析与答案不会进入搜索结果。登录后可搜索自己的收藏题单。
找到 24 个结果
中文题目周五午盘,一家 50 亿规模的 CN 私募把一份沪深300 alpha 数据甩到你工位:30 个特征、日频次日超额收益作标签。上一课那棵深度 15 的 CART 树样本内方向准确率 100%、样本外只有 51%——比抛硬币好不了多少,Sharpe 几乎为零。你把它换成 500 棵在 bootstrap 样本上独立训练的深树取平均,样本外跳到 57%。这一跳,...
打开 →Why does bagging usually help deep trees much more than it helps already-stable learners?
打开 →Why should you not expect bagging alone to rescue a learner whose individual trees are systematically misspecified?
打开 →Why is bagging usually described as a variance-reduction tool rather than a bias-reduction tool?
打开 →Assume each tree has the same squared bias b^2 and prediction noise floor nu, while bagging only changes the variance term according to the equicorrelated-tree formula. Derive the bagged test MSE with B trees.
打开 →machine-learning · tree-based-methods · decision-tree · cart · impurity · pruning · bagging · random-forest
打开 →上海某私募的因子研究员把上一节的 500 棵随机森林训完,沪深300 + 中证500 上的样本外准确率 57%——比单棵深树的 51% 上了 6 个点。她把 max features 从 sqrt(p) 调到 p/3、把树数加到 2000,准确率纹丝不动停在 57.2%——bagging 的方差红利已经吃干净了。PM 在因子复盘会上一句话:「方差降到底了,把...
打开 →周五上午,你在上海的一家 量化 私募 ——明汯、 幻方、 九坤、 灵均 风格 的 多 因子 私募。 L3 把 四 条 信号 正交化 完了: mom 12 1 , book to market , gross profitability , pead sue 都 残差化 通过 了 IC break even 门槛。 桌面 上 还 没有 量产 复合 信号。 投决...
打开 →周一早盘九点二十,你接手了离职同事留下的 alpha 模型——一棵深度 15 的 CART(Classification and Regression Tree, CART)树,在三年 沪深300 成分股日度面板上训练,特征是动量、价值、质量、低波、5 日收益、20 日波动率、换手率等 12 个变量,目标是预测下一日超额收益方向(涨/跌)。样本内训练精度 1...
打开 →Define B_eff by matching the correlated-forest variance sigma^2 [rho + (1-rho)/B] to the variance sigma^2 / B_eff of averaging independent trees. Derive B_eff.
打开 →A single tree has variance 6, while an extremely large forest appears to level off at variance 1.8. What pairwise tree correlation rho is implied?
打开 →Using the equicorrelated-tree variance formula, derive the prediction variance as the number of trees B tends to infinity.
打开 →Under the equicorrelated-tree model, derive how much the ensemble variance falls when you move from B trees to B+1 trees.
打开 →Each tree has variance 9, pairwise correlation 0.2, and the forest has 25 trees. What is the variance of the forest average?
打开 →Suppose each tree has variance sigma^2 and pairwise correlation rho. Derive the minimum B needed to make the ensemble variance at most V, assuming V > rho sigma^2.
打开 →Suppose B trees each have variance sigma^2 and every pair has correlation rho. Derive the variance of their simple average.
打开 →Why can a larger forest fail to repair performance when the training labels themselves are systematically corrupted?
打开 →Why can random feature subsampling improve a forest when one very strong predictor would otherwise appear at the top of almost every tree?
打开 →Why does adding more trees to a random forest typically plateau rather than create the kind of explosive overfit seen in some single-model families?
打开 →Why can out-of-bag error fluctuate a lot on a small dataset even when the forest itself is reasonably stable?
打开 →Why can out-of-bag error be misleading when rows are linked by entity or time rather than being exchangeable?
打开 →Why does random-forest regression usually fail to extrapolate a trend far beyond the training range?
打开 →Why can making max_features too small hurt a random forest even though it lowers correlation?
打开 →周一开盘前一小时,你坐在上海一家中型私募基金(private fund)的研究室。投研经理把一张 CSV 推到桌上:沪深300 成分股 300 只,每只配 15 维因子向量(PE、PB、12 个月动量、20 日波动率、换手率、分析师上调比例),本质上是一张轻量级因子模型(factor model)输入表;标签 公式 表示下月相对指数 outperform /...
打开 →