INTERVIEW PREP

数学与非代码面试题

覆盖数学、概率、统计、脑筋急转弯、机器学习和金融。这里负责筛选和进入单题;编程题使用独立的 LeetCode 式 coding lab。

题目
4169
领域
8
当前筛选
73

1 / 4

非代码面试题

显示 20 / 73 道匹配题目

答题状态:未尝试未正确已正确
2621Residual Block Gradient 1A scalar residual block outputs y = x + f(x). Derive dy/dx.机器学习简单derivation未尝试免费2622Global-Norm Clipping Formula 2A gradient vector g has norm ||g|| greater than clip threshold c. Derive the clipped gradient under standard global-norm clipping.机器学习简单derivation未尝试免费2623One Momentum Update 15Suppose momentum uses v t = beta v t-1 + g t with beta=0.9, previous velocity v t-1 =0.5, and current gradient g t=2. What is v t?机器学习中等数值题未尝试免费2624Momentum as an Unrolled Geometric Sum 3If momentum obeys v t = beta v t-1 + g t, derive v t in terms of v 0 and the past gradients g 1,...,g t.机器学习中等derivation未尝试免费2625Decoupled Weight Decay Update 4Under decoupled weight decay with learning rate eta, decay lambda, parameters w t, and gradient g t, derive w t+1 .机器学习困难derivation未尝试免费2626Global-Norm Clipping Numerically 16A gradient vector is g=(6,8), whose norm is 10. If the clip threshold is 5, what clipped gradient is produced?机器学习简单数值题未尝试免费2627Linear Warmup Schedule 5A learning rate warms up linearly from 0 to eta max over T steps. Derive eta t for step t in the warmup phase.机器学习中等derivation未尝试免费2628Why Residual Connections Help Train Deep Nets 20Why do residual connections often make very deep networks easier to optimize?机器学习中等essay未尝试免费2629EMA From Zero Initialization 6Let m t = beta m t-1 + (1-beta) x t with m 0=0. Derive m t as an explicit weighted sum of x 1,...,x t.机器学习中等derivation未尝试免费2630Why BatchNorm Can Break Under Distribution Shift 21Why can a network that trains well with BatchNorm behave strangely at inference when the deployment distribution shifts?机器学习困难essay未尝试免费2631Shared-Parameter Gradient Adds Across Paths 7A parameter w is used in two separate branches whose losses contribute L 1(w) and L 2(w). What is d(L 1+L 2)/dw?机器学习简单derivation未尝试免费2632Warmup Learning Rate Numerically 17A linear warmup goes from 0 to 0.001 over 10 steps. What learning rate is used at step t=3 of the warmup?机器学习简单数值题未尝试免费2633Layer-Norm Shift Invariance 8Ignoring learned affine parameters, why does adding the same constant a to every coordinate of a vector leave layer-normalized activations unchanged?机器学习中等derivation未尝试免费2634Batch-Average Gradient 9If the minibatch loss is the average L = (1/B) sum i=1 B L i, derive dL/dw in terms of the per-example gradients.机器学习困难derivation未尝试免费2635Why Warmup Helps Large-Batch Training 22Why is learning-rate warmup often helpful when training with very large batches?机器学习困难essay未尝试免费2636Decoupled Weight Decay Numerically 18A scalar parameter has value w t=2, gradient g t=0.5, learning rate eta=0.1, and decoupled weight decay lambda=0.05. What is w t+1 ?机器学习简单数值题未尝试免费2637ReLU Local Derivative 10For ReLU(z)=max(0,z), what derivative does backprop use when z>0 and when z<0?机器学习中等derivation未尝试免费2638Residual Gradient Numerically 19A scalar residual block has y=x+f(x) with f(x)=3x 2. What is dy/dx at x=1?机器学习中等数值题未尝试免费2639Steady-State Momentum Under a Constant Gradient 11If v t = beta v t-1 + g with constant gradient g and |beta|<1, what constant value does v t converge to?机器学习困难derivation未尝试免费2640Cosine Decay Schedule 12A learning rate decays from eta max to eta min over T steps using cosine annealing. What is eta t at step t?机器学习困难derivation未尝试免费