第 2 / 2 页
非代码面试题
显示 5 / 25 道匹配题目
答题状态:未尝试未正确已正确
ID题目领域难度题型进度权限
2641Why Clipping Helps Exploding but Not Vanishing Gradients 23Why is gradient clipping a natural remedy for exploding gradients but not for vanishing gradients?机器学习简单essay未尝试免费2642BatchNorm Running Mean Update 13A BatchNorm layer updates its running mean by mu new = m mu old + (1-m) mu batch. What does this formula mean operationally?机器学习简单derivation未尝试免费2643Clipping Plus Weight Decay on a Vector 25A parameter vector is w t=(3,4). Its gradient is g=(6,8), whose norm is 10. Apply global-norm clipping with threshold 5, then a decoupled weight-decay step with learning rate eta=0.1 and lambda=0.1. What is the new parameter vector?机器学习中等数值题未尝试面试订阅2644Why LayerNorm Is Attractive in Sequence and Online Settings 24Why is LayerNorm often preferred over BatchNorm in sequence models or online inference settings?机器学习中等essay未尝试面试订阅2645Why Global-Norm Clipping Preserves Direction 14Why does global-norm clipping change the magnitude of a gradient vector but not its direction whenever clipping is active?机器学习困难derivation未尝试面试订阅