INTERVIEW PREP

数学与非代码面试题

覆盖数学、概率、统计、脑筋急转弯、机器学习和金融。这里负责筛选和进入单题；编程题使用独立的 LeetCode 式 coding lab。

做诊断按领域练习按面试风格练习代码题库

题目: 4169
领域: 8
当前筛选: 171

第 6 / 9 页

非代码面试题

显示 20 / 171 道匹配题目

答题状态：未尝试未正确已正确

ID题目领域难度题型进度权限

2602Why Early Stopping Matters Even if Train Loss Falls 12Why can validation performance start to deteriorate even while the training objective of boosting keeps improving?机器学习中等essay未尝试免费 2604Why Label Noise Is Especially Toxic 13Why does boosting often suffer badly when labels are noisy?机器学习中等essay未尝试免费 2607Why Overly Deep Base Trees Can Cancel Shrinkage Discipline 15Why can a very deep base tree undermine the regularizing effect of a small learning rate?机器学习简单essay未尝试免费 2611Why Boosting Parallelizes Worse Than Random Forests 16Why is boosting fundamentally harder to parallelize across rounds than random forests?机器学习简单essay未尝试免费 2614Why the Initial Prediction Matters 18Why can the choice of the initial prediction F 0 matter for the early trajectory of boosting?机器学习中等essay未尝试面试订阅 2615Why Calibration Can Degrade Before Ranking 19Why can late-stage boosting sometimes keep ranking examples well while making the predicted scores less well calibrated?机器学习困难essay未尝试面试订阅 2616Why Leaf-Wise Growth Can Be Higher Variance 20Why can leaf-wise tree growth be more variance-prone than level-wise growth inside a boosting system?机器学习简单essay未尝试免费 2618Why Many Small Corrections Can Beat One Big Tree 21Why can an additive sequence of small boosting steps outperform a single large tree with similar in-sample flexibility?机器学习中等essay未尝试面试订阅 2619Why Flat Late-Round Validation Gains Still Suggest Stopping 22If the validation gain per boosting round becomes tiny and erratic late in training, why is that often a strong argument for stopping?机器学习中等essay未尝试面试订阅 2622Global-Norm Clipping Formula 2A gradient vector g has norm ||g|| greater than clip threshold c. Derive the clipped gradient under standard global-norm clipping.机器学习简单derivation未尝试免费 2628Why Residual Connections Help Train Deep Nets 20Why do residual connections often make very deep networks easier to optimize?机器学习中等essay未尝试免费 2629EMA From Zero Initialization 6Let m t = beta m t-1 + (1-beta) x t with m 0=0. Derive m t as an explicit weighted sum of x 1,...,x t.机器学习中等derivation未尝试免费 2633Layer-Norm Shift Invariance 8Ignoring learned affine parameters, why does adding the same constant a to every coordinate of a vector leave layer-normalized activations unchanged?机器学习中等derivation未尝试免费 2645Why Global-Norm Clipping Preserves Direction 14Why does global-norm clipping change the magnitude of a gradient vector but not its direction whenever clipping is active?机器学习困难derivation未尝试面试订阅 2655Why Expanding Windows Can Beat Rolling Windows Under Sparse DataWhy might an expanding-window CV design be preferable to a rolling-window design when the series is short and drift is present but not violent?机器学习困难essay未尝试面试订阅 2665Why Tiny Folds Can Exaggerate RegularizationWhy can a very small training fold make heavily regularized models look better than they would on the full training set?机器学习困难essay未尝试面试订阅 2666Why Outer-Fold Disagreement Is InformativeIf different outer folds in nested CV keep selecting different hyperparameters, what does that usually say about the learning problem?机器学习简单essay未尝试免费 2680Why Low R-Squared Can Still Be Valuable Yet Hard to VerifyWhy can a signal with tiny explanatory power still be economically useful, while also being unusually hard to validate convincingly?机器学习困难essay未尝试面试订阅 2683Why Long Training Windows Can Learn the Wrong WorldWhy can adding more historical years lower estimation variance and yet make a finance model worse?机器学习中等essay未尝试面试订阅 2684Why Short Windows Adapt but Also WhipsawWhy does a short rolling window often react faster to new regimes while simultaneously making parameter estimates much less stable?机器学习困难essay未尝试面试订阅