2633机器学习中等derivationmedium
Layer-Norm Shift Invariance 8
题目
Ignoring learned affine parameters, why does adding the same constant a to every coordinate of a vector leave layer-normalized activations unchanged?
解题计时
0:00
提交作答时记录,用于后续平均用时统计。
题目
Ignoring learned affine parameters, why does adding the same constant a to every coordinate of a vector leave layer-normalized activations unchanged?
解题计时
0:00
提交作答时记录,用于后续平均用时统计。