监督学习基础
machine-learning · statistical-learning · supervised-learning · erm · loss-functions · bayes-optimal · bias-variance · generalization
打开 →GLOBAL SEARCH
搜索在服务端完成,题目解析与答案不会进入搜索结果。登录后可搜索自己的收藏题单。
找到 30 个结果
中文题目machine-learning · statistical-learning · supervised-learning · erm · loss-functions · bayes-optimal · bias-variance · generalization
打开 →Let $X\sim \mathrm{Binomial}(10,p)$ and consider the estimator $$\delta = \frac{X+1}{12}$$ for $p$. At the parameter value $p=0.2$, compute the bias, variance, and MSE of $\delta$, and compare its MSE with the usual sample proportion $\hat p = X/10$.
打开 →Suppose Xbar ~ N(theta, 0.16). A desk uses delta = 0.6Xbar + 0.8. For what values of theta does delta have lower MSE than Xbar?
打开 →You observe the diagnostic statement: (1-0.5L) X_t = (1-0.5L) e_t. What is the correct modeling conclusion?
打开 →Two unbiased estimators of the same parameter have variances 9 and 4, and their correlation is 0.5. For T(a) = aT1 + (1-a)T2, what value of a minimizes variance, and what is the resulting minimum variance?
打开 →An unbiased estimator U has variance 0.06. A regularized estimator R has variance 0.03 and constant bias b. What is the largest absolute bias |b| for which R still has smaller MSE than U?
打开 →A slow benchmark estimator U is unbiased with variance 0.64. A faster proxy P has variance 0.25 but constant bias b. What is the largest absolute bias |b| for which P still has smaller MSE than U?
打开 →A regularization change reduces a model's variance term from 0.30 to 0.11 while leaving irreducible noise unchanged. How much extra bias squared could you add before the total MSE stops improving?
打开 →At sample size n=60, compare model A with excess error 0.04 + 12/n to model B with excess error 0.16 + 2/n. Which one has smaller excess test error?
打开 →A model's variance term is currently 0.30, and irreducible noise is 0.05. If variance scales exactly like 1/n, by what factor must the dataset grow so the variance term falls to 0.05?
打开 →A false negative costs 5 and a false positive costs 1. If p is the predicted probability of the positive class, above what threshold should you classify as positive?
打开 →If two predictors are exactly identical and the model uses pure Lasso, what modeling pathology should you expect?
打开 →Define B_eff by matching the correlated-forest variance sigma^2 [rho + (1-rho)/B] to the variance sigma^2 / B_eff of averaging independent trees. Derive B_eff.
打开 →Let m_t = beta m_{t-1} + (1-beta) x_t with m_0=0. Derive m_t as an explicit weighted sum of x_1,...,x_t.
打开 →A surrogate split agrees with the primary split on 34 of 40 training cases where both features are present. If 12 production cases are missing the primary split feature and are routed by the surrogate, what is the expected number of misroutes?
打开 →A gradient vector g has norm ||g|| greater than clip threshold c. Derive the clipped gradient under standard global-norm clipping.
打开 →Each independently trained model has variance 2.4 and negligible bias. How many equally weighted independent fits must you average to bring the variance term below 0.3?
打开 →A regularization change raises bias^2 from 0.03 to 0.07 but cuts variance from 0.22 to 0.08. By how much does excess test error improve?
打开 →A single tree has variance 6, while an extremely large forest appears to level off at variance 1.8. What pairwise tree correlation rho is implied?
打开 →Using the equicorrelated-tree variance formula, derive the prediction variance as the number of trees B tends to infinity.
打开 →A noisy unbiased signal X satisfies X ~ N(theta, 0.25). A fallback benchmark always reports 1.2. For what values of theta does the fixed benchmark have lower MSE than X?
打开 →A standardized lasso fit has absolute score magnitudes (3.8, 2.5, 0.9). What is the smallest lambda that zeroes the weakest feature while leaving the other two still active?
打开 →Ignoring learned affine parameters, why does adding the same constant a to every coordinate of a vector leave layer-normalized activations unchanged?
打开 →An event occurs (y=1). Forecast A assigns probability 0.9 and forecast B assigns probability 0.7. By how much is B's log loss larger than A's?
打开 →Under the equicorrelated-tree model, derive how much the ensemble variance falls when you move from B trees to B+1 trees.
打开 →In dimension p = 4 with unit noise variance, the positive-part James-Stein shrinkage factor is 0.75 for an observed vector x. What value of ||x||^2 produced that factor?
打开 →An estimator A is unbiased for theta and satisfies Var(A) = 0.3 theta^2. A risk team reports delta_c = cA instead. Find the value of c that minimizes MSE, and give the minimum MSE as a multiple of theta^2.
打开 →A desk observes X ~ N(theta, 9) and reports delta_c = cX + (1-c)4. At the specific parameter value theta = 5, what choice of c minimizes MSE, and what is the minimum MSE?
打开 →Model A is unbiased with variance 9. Model B has variance 1.44 and fixed bias 0.6. If you blend them as P_w = wA + (1-w)B and treat their errors as independent, what weight w minimizes MSE?
打开 →Suppose two features x1 and x2 are centered and orthogonal. Derive the OLS coefficients in terms of x1^T y, x2^T y, ||x1||^2, and ||x2||^2.
打开 →