Before Adding Augmentation
A teammate proposes aggressive data augmentation as a universal fix. What is the first check you should make before accepting that plan?
打开 →GLOBAL SEARCH
搜索在服务端完成,题目解析与答案不会进入搜索结果。登录后可搜索自己的收藏题单。
找到 19 个结果
中文题目A teammate proposes aggressive data augmentation as a universal fix. What is the first check you should make before accepting that plan?
打开 →You are tempted to raise dropout from 0.2 to 0.6 after one mediocre run. What is the first diagnostic question you should answer before doing that?
打开 →Your validation metric is noisy day to day. Before treating the first local peak as the stopping point, what should you calibrate?
打开 →Two hidden layers memorize pairs of co-occurring signals. In-sample metrics look great, but when one signal in the pair shifts slightly out of sample, performance collapses. Which control is most naturally aimed at reducing this co-adaptation?
打开 →Keep eta = 0.1, gradient g = 0.3, and current weight w = 2.0. In the decoupled update w_new = (1 - eta*lambda)w - eta*g, lambda rises from 0.05 to 0.10. By how much does the updated weight decrease relative to the old lambda case?
打开 →A unit has activation 2.0 before standard dropout, meaning dropped units become 0 and kept units stay at 2.0. If keep probability falls from 0.8 to 0.5, what happens to the expected post-dropout activation?
打开 →A 5-class model uses label smoothing with epsilon distributed uniformly across all classes. If epsilon rises from 0.1 to 0.3, by how much does the true-class target change?
打开 →A proximal L1 step uses sign(w)*max(|w| - tau, 0). If the pre-step weight is 0.6, what output do you get when tau rises from 0.2 to 0.5?
打开 →A classifier already has good accuracy, but on borderline names it assigns 99% probability too often and the labels are believed to contain small noise. Which regularization change best targets that failure mode?
打开 →In an overparameterized network, why is it a mistake to discuss regularization strength without also looking at optimizer and data pipeline choices?
打开 →You are training on a small image-like signal dataset where small translations and mirror flips preserve the label by construction. The network fits the training set too easily. What regularization lever should move to the front of the queue?
打开 →A wide MLP on 8k tabular rows drives training AUC to 0.99 while validation AUC stalls at 0.76. Feature semantics do not support label-preserving augmentation, and the largest weights sit on sparse one-hot inputs. Which regularization control should you try first?
打开 →Training loss keeps improving every epoch, but validation Sharpe peaks around epoch 11 and then gradually drifts lower. You are not changing architecture or dataset. What regularization move is most justified?
打开 →A hidden unit has pre-dropout activation 3.2. You apply inverted dropout with keep probability 0.8. If the unit is kept on this training pass, what value is forwarded after dropout?
打开 →A 4-class classifier uses label smoothing with epsilon = 0.2, distributing epsilon uniformly across all 4 classes including the true class. If class 3 is the correct label, what smoothed target vector do you train on?
打开 →A parameter has current value w = 2.0 and gradient g = 0.3. Using a decoupled weight-decay update w_new = (1 - eta*lambda) w - eta*g with eta = 0.1 and lambda = 0.05, what is the updated weight after one step?
打开 →A layer weight vector is w = (3, 4), so its norm is 5. You enforce max-norm regularization with cap c = 4 by rescaling only when the norm exceeds c. What vector is stored after clipping?
打开 →An optimizer uses the proximal L1 shrinkage step sign(w)*max(|w| - tau, 0). If the pre-step weight is w = 0.7 and tau = 0.2, what weight remains after shrinkage?
打开 →Performance falls as you increase weight decay. Before concluding that regularization is bad, what structural question should you ask about the signal?
打开 →