第 6 / 9 页
非代码面试题
显示 20 / 171 道匹配题目
答题状态:未尝试未正确已正确
ID题目领域难度题型进度权限
2602Why Early Stopping Matters Even if Train Loss Falls 12Why can validation performance start to deteriorate even while the training objective of boosting keeps improving?机器学习中等essay未尝试免费2604Why Label Noise Is Especially Toxic 13Why does boosting often suffer badly when labels are noisy?机器学习中等essay未尝试免费2607Why Overly Deep Base Trees Can Cancel Shrinkage Discipline 15Why can a very deep base tree undermine the regularizing effect of a small learning rate?机器学习简单essay未尝试免费2611Why Boosting Parallelizes Worse Than Random Forests 16Why is boosting fundamentally harder to parallelize across rounds than random forests?机器学习简单essay未尝试免费2614Why the Initial Prediction Matters 18Why can the choice of the initial prediction F 0 matter for the early trajectory of boosting?机器学习中等essay未尝试面试订阅2615Why Calibration Can Degrade Before Ranking 19Why can late-stage boosting sometimes keep ranking examples well while making the predicted scores less well calibrated?机器学习困难essay未尝试面试订阅2616Why Leaf-Wise Growth Can Be Higher Variance 20Why can leaf-wise tree growth be more variance-prone than level-wise growth inside a boosting system?机器学习简单essay未尝试免费2618Why Many Small Corrections Can Beat One Big Tree 21Why can an additive sequence of small boosting steps outperform a single large tree with similar in-sample flexibility?机器学习中等essay未尝试面试订阅2619Why Flat Late-Round Validation Gains Still Suggest Stopping 22If the validation gain per boosting round becomes tiny and erratic late in training, why is that often a strong argument for stopping?机器学习中等essay未尝试面试订阅2622Global-Norm Clipping Formula 2A gradient vector g has norm ||g|| greater than clip threshold c. Derive the clipped gradient under standard global-norm clipping.机器学习简单derivation未尝试免费2628Why Residual Connections Help Train Deep Nets 20Why do residual connections often make very deep networks easier to optimize?机器学习中等essay未尝试免费2629EMA From Zero Initialization 6Let m t = beta m t-1 + (1-beta) x t with m 0=0. Derive m t as an explicit weighted sum of x 1,...,x t.机器学习中等derivation未尝试免费2633Layer-Norm Shift Invariance 8Ignoring learned affine parameters, why does adding the same constant a to every coordinate of a vector leave layer-normalized activations unchanged?机器学习中等derivation未尝试免费2645Why Global-Norm Clipping Preserves Direction 14Why does global-norm clipping change the magnitude of a gradient vector but not its direction whenever clipping is active?机器学习困难derivation未尝试面试订阅2655Why Expanding Windows Can Beat Rolling Windows Under Sparse DataWhy might an expanding-window CV design be preferable to a rolling-window design when the series is short and drift is present but not violent?机器学习困难essay未尝试面试订阅2665Why Tiny Folds Can Exaggerate RegularizationWhy can a very small training fold make heavily regularized models look better than they would on the full training set?机器学习困难essay未尝试面试订阅2666Why Outer-Fold Disagreement Is InformativeIf different outer folds in nested CV keep selecting different hyperparameters, what does that usually say about the learning problem?机器学习简单essay未尝试免费2680Why Low R-Squared Can Still Be Valuable Yet Hard to VerifyWhy can a signal with tiny explanatory power still be economically useful, while also being unusually hard to validate convincingly?机器学习困难essay未尝试面试订阅2683Why Long Training Windows Can Learn the Wrong WorldWhy can adding more historical years lower estimation variance and yet make a finance model worse?机器学习中等essay未尝试面试订阅2684Why Short Windows Adapt but Also WhipsawWhy does a short rolling window often react faster to new regimes while simultaneously making parameter estimates much less stable?机器学习困难essay未尝试面试订阅