INTERVIEW PREP

数学与非代码面试题

覆盖数学、概率、统计、脑筋急转弯、机器学习和金融。这里负责筛选和进入单题；编程题使用独立的 LeetCode 式 coding lab。

做诊断按领域练习按面试风格练习代码题库

题目: 4169
领域: 8
当前筛选: 235

第 8 / 12 页

非代码面试题

显示 20 / 235 道匹配题目

答题状态：未尝试未正确已正确

ID题目领域难度题型进度权限

2597Weighted Region Update 2If observations in a boosting region R carry positive weights w i, derive the constant update gamma that minimizes sum i in R w i (r i-gamma) 2.机器学习简单derivation未尝试免费 2598Final Prediction After Three Boosting Rounds 23A boosting model starts from F 0(x)=10. For one observation, the leaf updates along its path are +1.2, -0.5, and +0.8 across three rounds, with learning rate eta=0.1 each round. What is the final prediction?机器学习中等数值题未尝试免费 2608Residual After Two Shrunken Updates 24A point currently has residual 6. Two boosting rounds hit its region with leaf updates 1.5 and 0.8, using learning rate eta=0.2 in both rounds. What residual remains after the two rounds?机器学习中等数值题未尝试免费 2610Scale-Update Invariance Between Eta and Gamma 6Why does multiplying every leaf update gamma m by c and dividing the learning rate eta by c leave the final additive score unchanged?机器学习困难derivation未尝试面试订阅 2621Residual Block Gradient 1A scalar residual block outputs y = x + f(x). Derive dy/dx.机器学习简单derivation未尝试免费 2622Global-Norm Clipping Formula 2A gradient vector g has norm ||g|| greater than clip threshold c. Derive the clipped gradient under standard global-norm clipping.机器学习简单derivation未尝试免费 2625Decoupled Weight Decay Update 4Under decoupled weight decay with learning rate eta, decay lambda, parameters w t, and gradient g t, derive w t+1 .机器学习困难derivation未尝试免费 2627Linear Warmup Schedule 5A learning rate warms up linearly from 0 to eta max over T steps. Derive eta t for step t in the warmup phase.机器学习中等derivation未尝试免费 2629EMA From Zero Initialization 6Let m t = beta m t-1 + (1-beta) x t with m 0=0. Derive m t as an explicit weighted sum of x 1,...,x t.机器学习中等derivation未尝试免费 2630Why BatchNorm Can Break Under Distribution Shift 21Why can a network that trains well with BatchNorm behave strangely at inference when the deployment distribution shifts?机器学习困难essay未尝试免费 2631Shared-Parameter Gradient Adds Across Paths 7A parameter w is used in two separate branches whose losses contribute L 1(w) and L 2(w). What is d(L 1+L 2)/dw?机器学习简单derivation未尝试免费 2632Warmup Learning Rate Numerically 17A linear warmup goes from 0 to 0.001 over 10 steps. What learning rate is used at step t=3 of the warmup?机器学习简单数值题未尝试免费 2634Batch-Average Gradient 9If the minibatch loss is the average L = (1/B) sum i=1 B L i, derive dL/dw in terms of the per-example gradients.机器学习困难derivation未尝试免费 2635Why Warmup Helps Large-Batch Training 22Why is learning-rate warmup often helpful when training with very large batches?机器学习困难essay未尝试免费 2637ReLU Local Derivative 10For ReLU(z)=max(0,z), what derivative does backprop use when z>0 and when z<0?机器学习中等derivation未尝试免费 2639Steady-State Momentum Under a Constant Gradient 11If v t = beta v t-1 + g with constant gradient g and |beta|<1, what constant value does v t converge to?机器学习困难derivation未尝试免费 2640Cosine Decay Schedule 12A learning rate decays from eta max to eta min over T steps using cosine annealing. What is eta t at step t?机器学习困难derivation未尝试免费 2641Why Clipping Helps Exploding but Not Vanishing Gradients 23Why is gradient clipping a natural remedy for exploding gradients but not for vanishing gradients?机器学习简单essay未尝试免费 2642BatchNorm Running Mean Update 13A BatchNorm layer updates its running mean by mu new = m mu old + (1-m) mu batch. What does this formula mean operationally?机器学习简单derivation未尝试免费 2643Clipping Plus Weight Decay on a Vector 25A parameter vector is w t=(3,4). Its gradient is g=(6,8), whose norm is 10. Apply global-norm clipping with threshold 5, then a decoupled weight-decay step with learning rate eta=0.1 and lambda=0.1. What is the new parameter vector?机器学习中等数值题未尝试面试订阅