第 2 / 4 页
非代码面试题
显示 20 / 73 道匹配题目
答题状态:未尝试未正确已正确
ID题目领域难度题型进度权限
2641Why Clipping Helps Exploding but Not Vanishing Gradients 23Why is gradient clipping a natural remedy for exploding gradients but not for vanishing gradients?机器学习简单essay未尝试免费2642BatchNorm Running Mean Update 13A BatchNorm layer updates its running mean by mu new = m mu old + (1-m) mu batch. What does this formula mean operationally?机器学习简单derivation未尝试免费2643Clipping Plus Weight Decay on a Vector 25A parameter vector is w t=(3,4). Its gradient is g=(6,8), whose norm is 10. Apply global-norm clipping with threshold 5, then a decoupled weight-decay step with learning rate eta=0.1 and lambda=0.1. What is the new parameter vector?机器学习中等数值题未尝试面试订阅2644Why LayerNorm Is Attractive in Sequence and Online Settings 24Why is LayerNorm often preferred over BatchNorm in sequence models or online inference settings?机器学习中等essay未尝试面试订阅2645Why Global-Norm Clipping Preserves Direction 14Why does global-norm clipping change the magnitude of a gradient vector but not its direction whenever clipping is active?机器学习困难derivation未尝试面试订阅4291Weight Decay Shrinkage 1A hidden unit has pre-dropout activation 3.2. You apply inverted dropout with keep probability 0.8. If the unit is kept on this training pass, what value is forwarded after dropout?机器学习简单数值题未尝试面试订阅4292Weight Decay Shrinkage 2A 4-class classifier uses label smoothing with epsilon = 0.2, distributing epsilon uniformly across all 4 classes including the true class. If class 3 is the correct label, what smoothed target vector do you train on?机器学习简单数值题未尝试面试订阅4293Weight Decay Shrinkage 3A parameter has current value w = 2.0 and gradient g = 0.3. Using a decoupled weight-decay update w new = (1 - eta*lambda) w - eta*g with eta = 0.1 and lambda = 0.05, what is the updated weight after one step?机器学习简单数值题未尝试面试订阅4294Weight Decay Shrinkage 4A layer weight vector is w = (3, 4), so its norm is 5. You enforce max-norm regularization with cap c = 4 by rescaling only when the norm exceeds c. What vector is stored after clipping?机器学习简单数值题未尝试面试订阅4295Weight Decay Shrinkage 5An optimizer uses the proximal L1 shrinkage step sign(w)*max(|w| - tau, 0). If the pre-step weight is w = 0.7 and tau = 0.2, what weight remains after shrinkage?机器学习简单数值题未尝试面试订阅4296Dropout Noise Level 1Keep eta = 0.1, gradient g = 0.3, and current weight w = 2.0. In the decoupled update w new = (1 - eta*lambda)w - eta*g, lambda rises from 0.05 to 0.10. By how much does the updated weight decrease relative to the old lambda case?机器学习中等数值题未尝试面试订阅4297Dropout Noise Level 2A unit has activation 2.0 before standard dropout, meaning dropped units become 0 and kept units stay at 2.0. If keep probability falls from 0.8 to 0.5, what happens to the expected post-dropout activation?机器学习中等数值题未尝试面试订阅4298Dropout Noise Level 3A 5-class model uses label smoothing with epsilon distributed uniformly across all classes. If epsilon rises from 0.1 to 0.3, by how much does the true-class target change?机器学习中等数值题未尝试面试订阅4300Dropout Noise Level 5A proximal L1 step uses sign(w)*max(|w| - tau, 0). If the pre-step weight is 0.6, what output do you get when tau rises from 0.2 to 0.5?机器学习中等数值题未尝试面试订阅4301Label Smoothing Loss 1Mixup combines one-hot labels for class 1 and class 4 in a 4-class problem with lambda = 0.3 on class 1's example. What mixed target vector is produced?机器学习中等数值题未尝试面试订阅4302Label Smoothing Loss 2A network uses stochastic depth on 12 residual blocks, each with survival probability 0.75 during training. How many blocks are active on average in a training pass?机器学习中等数值题未尝试面试订阅4303Label Smoothing Loss 3A layer has 400 weights. DropConnect keeps each weight independently with probability 0.9 during training. How many weights are active on average in one forward pass?机器学习中等数值题未尝试面试订阅4304Label Smoothing Loss 4A unit has activation a = 3 before inverted dropout with keep probability q = 0.75. During training the output is either 0 or a/q. What is the variance of that post-dropout output?机器学习中等数值题未尝试面试订阅4306Sparse Weights Blow UpA wide MLP on 8k tabular rows drives training AUC to 0.99 while validation AUC stalls at 0.76. Feature semantics do not support label-preserving augmentation, and the largest weights sit on sparse one-hot inputs. Which regularization control should you try first?机器学习中等essay未尝试面试订阅4307Validation Peak Then DriftTraining loss keeps improving every epoch, but validation Sharpe peaks around epoch 11 and then gradually drifts lower. You are not changing architecture or dataset. What regularization move is most justified?机器学习中等essay未尝试面试订阅