全站搜索 — 锐望实验室

全部 · 4546 课程 · 299 模块 · 72 题目 · 4169 帮助 · 6 收藏题单 · 0

找到 25 个结果

中文题目

模块2.6.2 · 数学与统计能力 · 机器学习理论

树模型与核方法

machine-learning · tree-based-methods · decision-tree · cart · impurity · pruning · bagging · random-forest

打开 →

课程树模型与核方法 · 机器学习理论

Bagging 与随机森林

周五午盘，一家 50 亿规模的 CN 私募把一份沪深300 alpha 数据甩到你工位:30 个特征、日频次日超额收益作标签。上一课那棵深度 15 的 CART 树样本内方向准确率 100%、样本外只有 51%——比抛硬币好不了多少，Sharpe 几乎为零。你把它换成 500 棵在 bootstrap 样本上独立训练的深树取平均，样本外跳到 57%。这一跳，...

打开 →

题目2555 · 机器学习

Best Valid Split Under a Minimum-Leaf Constraint

Three candidate splits on the same node have Gini gains 0.18, 0.16, and 0.11, with smaller-child sizes 3, 4, and 7 respectively. If the minimum allowed leaf size is 4, which split is actually chosen?

打开 →

题目2566 · 机器学习

Choose the Weakest-Link Node to Prune 24

Node A would have leaf error 12 if pruned, while its current subtree has error 7 and 3 leaves. Node B would have leaf error 9 if pruned, while its current subtree has error 6 and 2 leaves. Which node is the weaker link and should be pruned first under cost-complexity pruning?

打开 →

题目2568 · 机器学习

Compare Penalized Tree Options 25

A parent node left uncut has SSE 70. A 2-leaf split gives total SSE 44. A 3-leaf subtree gives total SSE 36. If the complexity penalty is 10 per extra leaf relative to the uncut node, which option has the lowest penalized objective?

打开 →

题目2554 · 机器学习

Cost-Sensitive Leaf Label 21

A leaf contains 7 positives and 13 negatives. Predicting negative incurs false-negative cost 4 on each hidden positive, while predicting positive incurs false-positive cost 1 on each hidden negative. Which class should the leaf predict?

打开 →

题目2559 · 机器学习

Expected Misroutes From a Surrogate Split

A surrogate split agrees with the primary split on 34 of 40 training cases where both features are present. If 12 production cases are missing the primary split feature and are routed by the surrogate, what is the expected number of misroutes?

打开 →

题目2560 · 机器学习

Global Weight Rescaling Leaves Split Ranking Unchanged 5

If every sample weight in a node is multiplied by the same constant c>0, how does each candidate split's weighted impurity decrease change?

打开 →

题目2556 · 机器学习

Grouped Values and Feasible Thresholds 22

A sorted feature has five distinct-value blocks of sizes [3, 5, 2, 4, 6], and splits are allowed only between distinct-value blocks. If each child leaf must contain at least 6 observations, how many legal thresholds exist?

打开 →

题目2546 · 机器学习

Legal Threshold Count With a Leaf-Size Floor 15

A sorted feature has 31 observations, and each child leaf must contain at least 6 observations. How many legal split positions are there?

打开 →

题目2553 · 机器学习

Maximum Balanced Depth Numerically 20

A tree starts with 96 observations at the root and every split is perfectly balanced. If each leaf must contain at least 12 observations, what is the maximum possible depth?

打开 →

题目2547 · 机器学习

Numeric Weakest-Link Alpha 16

A node has leaf error 18 if pruned into a single leaf. Its current subtree has training error 10 and 3 leaves. What is the weakest-link alpha for pruning this subtree?

打开 →

题目2550 · 机器学习

Optimal Leaf Label Under Asymmetric Trading Costs

A classification leaf contains 6 positive cases and 14 negative cases. Predicting positive costs 1 per false positive, while predicting negative costs 4 per false negative. Which class should the leaf predict to minimize expected leaf loss?

打开 →

题目2549 · 机器学习

Penalized Split Decision on a Regression Node 18

A regression leaf has SSE 260. Splitting it would reduce child SSE to 230. If the complexity penalty is 12 per extra leaf, should you keep the split?

打开 →

题目2570 · 机器学习

Surrogate Split Agreement Rate 8

A primary split is missing for some rows, so a surrogate split is trained on the M rows where the primary feature is observed. If it sends A of those rows to the same side as the primary split, what is its agreement rate?

打开 →

题目2564 · 机器学习

Validation Penalty Threshold for Keeping a Split

A stump has validation loss 30. Splitting it into two leaves lowers validation loss to 22 but adds an instability penalty lambda per extra leaf. For what largest lambda is the split still preferred?

打开 →

题目2565 · 机器学习

Validation Pruning With an Alpha Charge 23

Replacing a single leaf by a 3-leaf subtree reduces validation loss by 4.5. If the complexity charge is alpha = 1.2 per extra leaf, should you keep the subtree?

打开 →

题目2552 · 机器学习

Which Split Becomes Best After a Perturbation 19

Split A originally has gain 1.20 and split B has gain 1.05. After one row is corrected, A loses 0.10 gain while B gains 0.08. Which split is now best?

打开 →

题目2569 · 机器学习

Why Axis-Aligned Trees Struggle on Rotated Boundaries 14

Why can a decision tree need many small rectangles to approximate a simple diagonal boundary?

打开 →

题目2551 · 机器学习

Why Pre-Pruning Can Miss a Good Two-Step Split 9

Why can an aggressive pre-pruning rule reject a first split that looks weak locally even though it would unlock a much better second-level structure?

打开 →

题目2557 · 机器学习

Why Small Data Perturbations Can Rewrite the Whole Tree 10

Why are deep decision trees often called unstable learners?

打开 →

题目2567 · 机器学习

Why Two Nearly-Tied First Splits Can Diverge Later 13

Why can two root splits with almost identical immediate gain still lead to very different final trees?

打开 →

课程树模型与核方法 · 机器学习理论

决策树:CART、不纯度准则与剪枝

周一早盘九点二十,你接手了离职同事留下的 alpha 模型——一棵深度 15 的 CART(Classification and Regression Tree, CART)树,在三年沪深300 成分股日度面板上训练,特征是动量、价值、质量、低波、5 日收益、20 日波动率、换手率等 12 个变量,目标是预测下一日超额收益方向(涨/跌)。样本内训练精度 1...

打开 →

课程树模型与核方法 · 机器学习理论

核方法与支持向量机

周一开盘前一小时,你坐在上海一家中型私募基金(private fund)的研究室。投研经理把一张 CSV 推到桌上:沪深300 成分股 300 只,每只配 15 维因子向量(PE、PB、12 个月动量、20 日波动率、换手率、分析师上调比例),本质上是一张轻量级因子模型(factor model)输入表;标签公式表示下月相对指数 outperform /...

打开 →

课程树模型与核方法 · 机器学习理论

梯度提升与 XGBoost / LightGBM

上海某私募的因子研究员把上一节的 500 棵随机森林训完,沪深300 + 中证500 上的样本外准确率 57%——比单棵深树的 51% 上了 6 个点。她把 max features 从 sqrt(p) 调到 p/3、把树数加到 2000,准确率纹丝不动停在 57.2%——bagging 的方差红利已经吃干净了。PM 在因子复盘会上一句话:「方差降到底了,把...

打开 →