第 77 / 209 页
非代码面试题
显示 20 / 4169 道匹配题目
答题状态:未尝试未正确已正确
ID题目领域难度题型进度权限
2604Why Label Noise Is Especially Toxic 13Why does boosting often suffer badly when labels are noisy?机器学习中等essay未尝试免费2607Why Overly Deep Base Trees Can Cancel Shrinkage Discipline 15Why can a very deep base tree undermine the regularizing effect of a small learning rate?机器学习简单essay未尝试免费2608Residual After Two Shrunken Updates 24A point currently has residual 6. Two boosting rounds hit its region with leaf updates 1.5 and 0.8, using learning rate eta=0.2 in both rounds. What residual remains after the two rounds?机器学习中等数值题未尝试免费2610Scale-Update Invariance Between Eta and Gamma 6Why does multiplying every leaf update gamma m by c and dividing the learning rate eta by c leave the final additive score unchanged?机器学习困难derivation未尝试面试订阅2611Why Boosting Parallelizes Worse Than Random Forests 16Why is boosting fundamentally harder to parallelize across rounds than random forests?机器学习简单essay未尝试免费2613L2-Regularized Region Update 7In one boosting region, choose a constant update gamma to minimize sum i in R (r i-gamma) 2 + lambda gamma 2. Let S = sum i in R r i and n = |R|. Derive gamma.机器学习困难derivation未尝试面试订阅2614Why the Initial Prediction Matters 18Why can the choice of the initial prediction F 0 matter for the early trajectory of boosting?机器学习中等essay未尝试面试订阅2615Why Calibration Can Degrade Before Ranking 19Why can late-stage boosting sometimes keep ranking examples well while making the predicted scores less well calibrated?机器学习困难essay未尝试面试订阅2616Why Leaf-Wise Growth Can Be Higher Variance 20Why can leaf-wise tree growth be more variance-prone than level-wise growth inside a boosting system?机器学习简单essay未尝试免费2617Two-Region Two-Round Boosting Calculation 25A boosting model starts from F 0=0 with learning rate eta=0.1. In round 1, region A gets update +2 and region B gets update -1. In round 2, region A gets update -0.5 and region B gets update +0.25. What are the final predictions for a point that always stays in region A and a point that always stays in region B?机器学习简单数值题未尝试免费2618Why Many Small Corrections Can Beat One Big Tree 21Why can an additive sequence of small boosting steps outperform a single large tree with similar in-sample flexibility?机器学习中等essay未尝试面试订阅2619Why Flat Late-Round Validation Gains Still Suggest Stopping 22If the validation gain per boosting round becomes tiny and erratic late in training, why is that often a strong argument for stopping?机器学习中等essay未尝试面试订阅2620A Bound on Total Function Movement 8Suppose every boosting round changes any one point's prediction by at most eta A in absolute value. What upper bound does this imply on the total prediction movement after M rounds?机器学习困难derivation未尝试面试订阅2621Residual Block Gradient 1A scalar residual block outputs y = x + f(x). Derive dy/dx.机器学习简单derivation未尝试免费2622Global-Norm Clipping Formula 2A gradient vector g has norm ||g|| greater than clip threshold c. Derive the clipped gradient under standard global-norm clipping.机器学习简单derivation未尝试免费2623One Momentum Update 15Suppose momentum uses v t = beta v t-1 + g t with beta=0.9, previous velocity v t-1 =0.5, and current gradient g t=2. What is v t?机器学习中等数值题未尝试免费2624Momentum as an Unrolled Geometric Sum 3If momentum obeys v t = beta v t-1 + g t, derive v t in terms of v 0 and the past gradients g 1,...,g t.机器学习中等derivation未尝试免费2625Decoupled Weight Decay Update 4Under decoupled weight decay with learning rate eta, decay lambda, parameters w t, and gradient g t, derive w t+1 .机器学习困难derivation未尝试免费2626Global-Norm Clipping Numerically 16A gradient vector is g=(6,8), whose norm is 10. If the clip threshold is 5, what clipped gradient is produced?机器学习简单数值题未尝试免费2627Linear Warmup Schedule 5A learning rate warms up linearly from 0 to eta max over T steps. Derive eta t for step t in the warmup phase.机器学习中等derivation未尝试免费