第 1 / 2 页
非代码面试题
显示 20 / 26 道匹配题目
答题状态:未尝试未正确已正确
ID题目领域难度题型进度权限
2492Why Feature Scaling Helps Gradient Descent More Than Closed Form 22Why is feature scaling often crucial for gradient-descent training of OLS even though the closed-form solution itself is scale-equivariant?机器学习简单essay未尝试免费2515Why Small Lambda Means Weak Regularization 20Why does a very small lambda leave the regularized solution close to OLS?机器学习困难derivation未尝试面试订阅2522Intercept From the Positive Rate 2In an intercept-only logistic model, if the fitted probability is p hat, what intercept b solves sigma(b)=p hat?机器学习简单derivation未尝试免费2523Gradient of Logistic Negative Log-Likelihood 3For one observation (x,y) with y in 0,1 and score z = w T x, what is the gradient of the negative log-likelihood with respect to w?机器学习中等derivation未尝试免费2524Why No Closed Form in Logistic Regression 5Why does logistic regression usually require iterative optimization rather than a normal-equation-style closed form?机器学习中等essay未尝试免费2526Why Separable Data Pushes Coefficients Outward 7Why do logistic-regression coefficients tend to diverge on perfectly linearly separable data if no regularization is used?机器学习简单essay未尝试免费2534One Gradient Step on a Tiny Logistic ProblemA one-feature logistic model without intercept uses beta = 0 initially, learning rate 0.2, data x = [-1, 0, 1], and labels y = [0, 0, 1]. What is beta after one gradient step on the negative log-likelihood?机器学习困难数值题未尝试面试订阅2541One Gradient Step on a Single Logistic Observation 22For one observation with x = 2, y = 1, current weight w = 0, and learning rate eta = 0.4, what is one gradient-descent update on the negative log-likelihood?机器学习简单数值题未尝试免费2596Optimal Leaf Update Under Squared Loss 1In gradient boosting for squared error, a terminal region R is assigned one constant update gamma. Derive the gamma that minimizes sum i in R (r i-gamma) 2, where r i are the current residuals.机器学习简单derivation未尝试免费2597Weighted Region Update 2If observations in a boosting region R carry positive weights w i, derive the constant update gamma that minimizes sum i in R w i (r i-gamma) 2.机器学习简单derivation未尝试免费2608Residual After Two Shrunken Updates 24A point currently has residual 6. Two boosting rounds hit its region with leaf updates 1.5 and 0.8, using learning rate eta=0.2 in both rounds. What residual remains after the two rounds?机器学习中等数值题未尝试免费2623One Momentum Update 15Suppose momentum uses v t = beta v t-1 + g t with beta=0.9, previous velocity v t-1 =0.5, and current gradient g t=2. What is v t?机器学习中等数值题未尝试免费2624Momentum as an Unrolled Geometric Sum 3If momentum obeys v t = beta v t-1 + g t, derive v t in terms of v 0 and the past gradients g 1,...,g t.机器学习中等derivation未尝试免费2626Global-Norm Clipping Numerically 16A gradient vector is g=(6,8), whose norm is 10. If the clip threshold is 5, what clipped gradient is produced?机器学习简单数值题未尝试免费2628Why Residual Connections Help Train Deep Nets 20Why do residual connections often make very deep networks easier to optimize?机器学习中等essay未尝试免费2630Why BatchNorm Can Break Under Distribution Shift 21Why can a network that trains well with BatchNorm behave strangely at inference when the deployment distribution shifts?机器学习困难essay未尝试免费2633Layer-Norm Shift Invariance 8Ignoring learned affine parameters, why does adding the same constant a to every coordinate of a vector leave layer-normalized activations unchanged?机器学习中等derivation未尝试免费2634Batch-Average Gradient 9If the minibatch loss is the average L = (1/B) sum i=1 B L i, derive dL/dw in terms of the per-example gradients.机器学习困难derivation未尝试免费2635Why Warmup Helps Large-Batch Training 22Why is learning-rate warmup often helpful when training with very large batches?机器学习困难essay未尝试免费2636Decoupled Weight Decay Numerically 18A scalar parameter has value w t=2, gradient g t=0.5, learning rate eta=0.1, and decoupled weight decay lambda=0.05. What is w t+1 ?机器学习简单数值题未尝试免费