全站搜索 — 锐望实验室

全部 · 4546 课程 · 299 模块 · 72 题目 · 4169 帮助 · 6 收藏题单 · 0

找到 14 个结果

题目4216 · 机器学习

Normalized MDI Share 1

A random forest reports total mean-decrease-in-impurity contributions spread=0.42, imbalance=0.21, id_bucket=0.07. What are the normalized importance shares, and which feature ranks first?

打开 →

题目4218 · 机器学习

A sector feature is represented by three one-hot columns with impurity-gain importances 0.04, 0.03, and 0.01. Two other features have importances 0.05 and 0.07. If you aggregate the one-hot block into a single group, what are the normalized group shares and which group ranks firs

打开 →

模块2.6.2 · 数学与统计能力 · 机器学习理论

树模型与核方法

machine-learning · tree-based-methods · decision-tree · cart · impurity · pruning · bagging · random-forest

打开 →

课程树模型与核方法 · 机器学习理论

Bagging 与随机森林

周五午盘，一家 50 亿规模的 CN 私募把一份沪深300 alpha 数据甩到你工位:30 个特征、日频次日超额收益作标签。上一课那棵深度 15 的 CART 树样本内方向准确率 100%、样本外只有 51%——比抛硬币好不了多少，Sharpe 几乎为零。你把它换成 500 棵在 bootstrap 样本上独立训练的深树取平均，样本外跳到 57%。这一跳，...

打开 →

题目4217 · 机器学习

Normalized MDI Share 2

A model has baseline validation AUC 0.62. After permuting three features separately, AUC becomes 0.57 for value_signal, 0.60 for momentum, and 0.61 for zip_code. What permutation-importance drops do these imply, and which feature ranks first?

打开 →

题目4219 · 机器学习

Normalized MDI Share 4

Two trees contribute split gains to features A and B. Tree 1 contributes A=12, B=5. Tree 2 contributes A=8, B=10. What are the total normalized gain importances for A and B?

打开 →

题目4220 · 机器学习

Normalized MDI Share 5

A model has baseline log loss 0.400. After permuting feature X, log loss rises to 0.455; after permuting feature Y, it rises to 0.420. What are the permutation importances under a log-loss metric, and which feature is more important?

打开 →

课程树模型与核方法 · 机器学习理论

决策树:CART、不纯度准则与剪枝

周一早盘九点二十,你接手了离职同事留下的 alpha 模型——一棵深度 15 的 CART(Classification and Regression Tree, CART)树,在三年沪深300 成分股日度面板上训练,特征是动量、价值、质量、低波、5 日收益、20 日波动率、换手率等 12 个变量,目标是预测下一日超额收益方向(涨/跌)。样本内训练精度 1...

打开 →

课程树模型与核方法 · 机器学习理论

核方法与支持向量机

周一开盘前一小时,你坐在上海一家中型私募基金(private fund)的研究室。投研经理把一张 CSV 推到桌上:沪深300 成分股 300 只,每只配 15 维因子向量(PE、PB、12 个月动量、20 日波动率、换手率、分析师上调比例),本质上是一张轻量级因子模型(factor model)输入表;标签公式表示下月相对指数 outperform /...

打开 →

课程树模型与核方法 · 机器学习理论

梯度提升与 XGBoost / LightGBM

上海某私募的因子研究员把上一节的 500 棵随机森林训完,沪深300 + 中证500 上的样本外准确率 57%——比单棵深树的 51% 上了 6 个点。她把 max features 从 sqrt(p) 调到 p/3、把树数加到 2000,准确率纹丝不动停在 57.2%——bagging 的方差红利已经吃干净了。PM 在因子复盘会上一句话:「方差降到底了,把...

打开 →

题目2560 · 机器学习

Global Weight Rescaling Leaves Split Ranking Unchanged 5

If every sample weight in a node is multiplied by the same constant c>0, how does each candidate split's weighted impurity decrease change?

打开 →

题目4224 · 机器学习

Grouped Permutation Drop Pattern 4

An impurity-based feature ranking is id_hash=0.40, signal_1=0.35, signal_2=0.25. After limiting max depth, id_hash gain is cut in half while the other raw gains are unchanged. What are the new normalized shares?

打开 →

题目4226 · 机器学习

High-Cardinality ID Trap

A random forest says a hashed customer ID is the most important feature by impurity decrease, even though the validation permutation drop is almost zero. What is the most likely trap?

打开 →

题目4237 · 机器学习