树模型与核方法
machine-learning · tree-based-methods · decision-tree · cart · impurity · pruning · bagging · random-forest
打开 →GLOBAL SEARCH
搜索在服务端完成,题目解析与答案不会进入搜索结果。登录后可搜索自己的收藏题单。
找到 25 个结果
中文题目machine-learning · tree-based-methods · decision-tree · cart · impurity · pruning · bagging · random-forest
打开 →周五午盘,一家 50 亿规模的 CN 私募把一份沪深300 alpha 数据甩到你工位:30 个特征、日频次日超额收益作标签。上一课那棵深度 15 的 CART 树样本内方向准确率 100%、样本外只有 51%——比抛硬币好不了多少,Sharpe 几乎为零。你把它换成 500 棵在 bootstrap 样本上独立训练的深树取平均,样本外跳到 57%。这一跳,...
打开 →Three candidate splits on the same node have Gini gains 0.18, 0.16, and 0.11, with smaller-child sizes 3, 4, and 7 respectively. If the minimum allowed leaf size is 4, which split is actually chosen?
打开 →Node A would have leaf error 12 if pruned, while its current subtree has error 7 and 3 leaves. Node B would have leaf error 9 if pruned, while its current subtree has error 6 and 2 leaves. Which node is the weaker link and should be pruned first under cost-complexity pruning?
打开 →A parent node left uncut has SSE 70. A 2-leaf split gives total SSE 44. A 3-leaf subtree gives total SSE 36. If the complexity penalty is 10 per extra leaf relative to the uncut node, which option has the lowest penalized objective?
打开 →A leaf contains 7 positives and 13 negatives. Predicting negative incurs false-negative cost 4 on each hidden positive, while predicting positive incurs false-positive cost 1 on each hidden negative. Which class should the leaf predict?
打开 →A surrogate split agrees with the primary split on 34 of 40 training cases where both features are present. If 12 production cases are missing the primary split feature and are routed by the surrogate, what is the expected number of misroutes?
打开 →If every sample weight in a node is multiplied by the same constant c>0, how does each candidate split's weighted impurity decrease change?
打开 →A sorted feature has five distinct-value blocks of sizes [3, 5, 2, 4, 6], and splits are allowed only between distinct-value blocks. If each child leaf must contain at least 6 observations, how many legal thresholds exist?
打开 →A sorted feature has 31 observations, and each child leaf must contain at least 6 observations. How many legal split positions are there?
打开 →A tree starts with 96 observations at the root and every split is perfectly balanced. If each leaf must contain at least 12 observations, what is the maximum possible depth?
打开 →A node has leaf error 18 if pruned into a single leaf. Its current subtree has training error 10 and 3 leaves. What is the weakest-link alpha for pruning this subtree?
打开 →A classification leaf contains 6 positive cases and 14 negative cases. Predicting positive costs 1 per false positive, while predicting negative costs 4 per false negative. Which class should the leaf predict to minimize expected leaf loss?
打开 →A regression leaf has SSE 260. Splitting it would reduce child SSE to 230. If the complexity penalty is 12 per extra leaf, should you keep the split?
打开 →A primary split is missing for some rows, so a surrogate split is trained on the M rows where the primary feature is observed. If it sends A of those rows to the same side as the primary split, what is its agreement rate?
打开 →A stump has validation loss 30. Splitting it into two leaves lowers validation loss to 22 but adds an instability penalty lambda per extra leaf. For what largest lambda is the split still preferred?
打开 →Replacing a single leaf by a 3-leaf subtree reduces validation loss by 4.5. If the complexity charge is alpha = 1.2 per extra leaf, should you keep the subtree?
打开 →Split A originally has gain 1.20 and split B has gain 1.05. After one row is corrected, A loses 0.10 gain while B gains 0.08. Which split is now best?
打开 →Why can a decision tree need many small rectangles to approximate a simple diagonal boundary?
打开 →Why can an aggressive pre-pruning rule reject a first split that looks weak locally even though it would unlock a much better second-level structure?
打开 →Why are deep decision trees often called unstable learners?
打开 →Why can two root splits with almost identical immediate gain still lead to very different final trees?
打开 →周一早盘九点二十,你接手了离职同事留下的 alpha 模型——一棵深度 15 的 CART(Classification and Regression Tree, CART)树,在三年 沪深300 成分股日度面板上训练,特征是动量、价值、质量、低波、5 日收益、20 日波动率、换手率等 12 个变量,目标是预测下一日超额收益方向(涨/跌)。样本内训练精度 1...
打开 →周一开盘前一小时,你坐在上海一家中型私募基金(private fund)的研究室。投研经理把一张 CSV 推到桌上:沪深300 成分股 300 只,每只配 15 维因子向量(PE、PB、12 个月动量、20 日波动率、换手率、分析师上调比例),本质上是一张轻量级因子模型(factor model)输入表;标签 公式 表示下月相对指数 outperform /...
打开 →上海某私募的因子研究员把上一节的 500 棵随机森林训完,沪深300 + 中证500 上的样本外准确率 57%——比单棵深树的 51% 上了 6 个点。她把 max features 从 sqrt(p) 调到 p/3、把树数加到 2000,准确率纹丝不动停在 57.2%——bagging 的方差红利已经吃干净了。PM 在因子复盘会上一句话:「方差降到底了,把...
打开 →