第 8 / 12 页
非代码面试题
显示 20 / 235 道匹配题目
答题状态:未尝试未正确已正确
ID题目领域难度题型进度权限
2597Weighted Region Update 2If observations in a boosting region R carry positive weights w i, derive the constant update gamma that minimizes sum i in R w i (r i-gamma) 2.机器学习简单derivation未尝试免费2598Final Prediction After Three Boosting Rounds 23A boosting model starts from F 0(x)=10. For one observation, the leaf updates along its path are +1.2, -0.5, and +0.8 across three rounds, with learning rate eta=0.1 each round. What is the final prediction?机器学习中等数值题未尝试免费2608Residual After Two Shrunken Updates 24A point currently has residual 6. Two boosting rounds hit its region with leaf updates 1.5 and 0.8, using learning rate eta=0.2 in both rounds. What residual remains after the two rounds?机器学习中等数值题未尝试免费2610Scale-Update Invariance Between Eta and Gamma 6Why does multiplying every leaf update gamma m by c and dividing the learning rate eta by c leave the final additive score unchanged?机器学习困难derivation未尝试面试订阅2621Residual Block Gradient 1A scalar residual block outputs y = x + f(x). Derive dy/dx.机器学习简单derivation未尝试免费2622Global-Norm Clipping Formula 2A gradient vector g has norm ||g|| greater than clip threshold c. Derive the clipped gradient under standard global-norm clipping.机器学习简单derivation未尝试免费2625Decoupled Weight Decay Update 4Under decoupled weight decay with learning rate eta, decay lambda, parameters w t, and gradient g t, derive w t+1 .机器学习困难derivation未尝试免费2627Linear Warmup Schedule 5A learning rate warms up linearly from 0 to eta max over T steps. Derive eta t for step t in the warmup phase.机器学习中等derivation未尝试免费2629EMA From Zero Initialization 6Let m t = beta m t-1 + (1-beta) x t with m 0=0. Derive m t as an explicit weighted sum of x 1,...,x t.机器学习中等derivation未尝试免费2630Why BatchNorm Can Break Under Distribution Shift 21Why can a network that trains well with BatchNorm behave strangely at inference when the deployment distribution shifts?机器学习困难essay未尝试免费2631Shared-Parameter Gradient Adds Across Paths 7A parameter w is used in two separate branches whose losses contribute L 1(w) and L 2(w). What is d(L 1+L 2)/dw?机器学习简单derivation未尝试免费2632Warmup Learning Rate Numerically 17A linear warmup goes from 0 to 0.001 over 10 steps. What learning rate is used at step t=3 of the warmup?机器学习简单数值题未尝试免费2634Batch-Average Gradient 9If the minibatch loss is the average L = (1/B) sum i=1 B L i, derive dL/dw in terms of the per-example gradients.机器学习困难derivation未尝试免费2635Why Warmup Helps Large-Batch Training 22Why is learning-rate warmup often helpful when training with very large batches?机器学习困难essay未尝试免费2637ReLU Local Derivative 10For ReLU(z)=max(0,z), what derivative does backprop use when z>0 and when z<0?机器学习中等derivation未尝试免费2639Steady-State Momentum Under a Constant Gradient 11If v t = beta v t-1 + g with constant gradient g and |beta|<1, what constant value does v t converge to?机器学习困难derivation未尝试免费2640Cosine Decay Schedule 12A learning rate decays from eta max to eta min over T steps using cosine annealing. What is eta t at step t?机器学习困难derivation未尝试免费2641Why Clipping Helps Exploding but Not Vanishing Gradients 23Why is gradient clipping a natural remedy for exploding gradients but not for vanishing gradients?机器学习简单essay未尝试免费2642BatchNorm Running Mean Update 13A BatchNorm layer updates its running mean by mu new = m mu old + (1-m) mu batch. What does this formula mean operationally?机器学习简单derivation未尝试免费2643Clipping Plus Weight Decay on a Vector 25A parameter vector is w t=(3,4). Its gradient is g=(6,8), whose norm is 10. Apply global-norm clipping with threshold 5, then a decoupled weight-decay step with learning rate eta=0.1 and lambda=0.1. What is the new parameter vector?机器学习中等数值题未尝试面试订阅