GLOBAL SEARCH

搜索课程、模块、题目与收藏题单

搜索在服务端完成,题目解析与答案不会进入搜索结果。登录后可搜索自己的收藏题单。

找到 25 个结果

中文题目
题目2634 · 机器学习

Batch-Average Gradient 9

If the minibatch loss is the average L = (1/B) sum_{i=1}^B L_i, derive dL/dw in terms of the per-example gradients.

打开 →
题目2643 · 机器学习

Clipping Plus Weight Decay on a Vector 25

A parameter vector is w_t=(3,4). Its gradient is g=(6,8), whose norm is 10. Apply global-norm clipping with threshold 5, then a decoupled weight-decay step with learning rate eta=0.1 and lambda=0.1. What is the new parameter vector?

打开 →
题目2633 · 机器学习

Layer-Norm Shift Invariance 8

Ignoring learned affine parameters, why does adding the same constant a to every coordinate of a vector leave layer-normalized activations unchanged?

打开 →
题目2623 · 机器学习

One Momentum Update 15

Suppose momentum uses v_t = beta v_{t-1} + g_t with beta=0.9, previous velocity v_{t-1}=0.5, and current gradient g_t=2. What is v_t?

打开 →
题目2596 · 机器学习

Optimal Leaf Update Under Squared Loss 1

In gradient boosting for squared error, a terminal region R is assigned one constant update gamma. Derive the gamma that minimizes sum_{i in R} (r_i-gamma)^2, where r_i are the current residuals.

打开 →
题目2608 · 机器学习

Residual After Two Shrunken Updates 24

A point currently has residual 6. Two boosting rounds hit its region with leaf updates 1.5 and 0.8, using learning rate eta=0.2 in both rounds. What residual remains after the two rounds?

打开 →
题目2597 · 机器学习

Weighted Region Update 2

If observations in a boosting region R carry positive weights w_i, derive the constant update gamma that minimizes sum_{i in R} w_i (r_i-gamma)^2.

打开 →