Why Hierarchical Testing Can Help
A researcher first tests whether a sector shows any effect, and only if that passes does she test stocks inside that sector. Why can this hierarchical design reduce the multiplicity burden?
打开 →GLOBAL SEARCH
搜索在服务端完成,题目解析与答案不会进入搜索结果。登录后可搜索自己的收藏题单。
找到 30 个结果
中文题目A researcher first tests whether a sector shows any effect, and only if that passes does she test stocks inside that sector. Why can this hierarchical design reduce the multiplicity burden?
打开 →Why does picking a stop-loss threshold after looking at the full historical equity curve count as backtest search rather than risk management hygiene?
打开 →A PM is backtesting a signal-driven strategy and also wants to value a stop-loss overlay that behaves like an option. Which measure should dominate each part of the workflow?
打开 →Why is ES harder to backtest directly than VaR in day-to-day risk control?
打开 →Risk asks for a stress distribution of tomorrow's hedge-book PnL, while front office asks for today's mark of the same book. Should both use the same measure?
打开 →Why can an adaptive questioning strategy distinguish more states than a non-adaptive test battery with the same primitive tests?
打开 →An equal-weight portfolio holds two assets with volatilities 0.04 and 0.09. The current correlation estimate is -0.2, but stress testing revises it to +0.3. With weights unchanged, what is the new portfolio volatility, and by how many volatility points does it rise versus the ori
打开 →Suppose only 1% of tested trading ideas are genuinely predictive. A testing pipeline has 80% power and a 5% false-positive rate. Conditional on obtaining a positive result, what fraction of positives are truly real?
打开 →Consider testing $H_0:\mu=0$ versus $H_1:\mu>0$ with known $\sigma=10$ and sample size $n=25$ at significance level $\alpha=0.05$. What is the power when the true mean is $\mu=4$?
打开 →Two backtests differ only slightly: one reports p = 0.049 and the other p = 0.051. Why is it bad practice to call one ‘real’ and the other ‘not real’ purely because one is below 0.05?
打开 →You run an A/B test with $n$ Bernoulli observations in treatment and $n$ in control, all independent. Let $\bar X$ and $\bar Y$ be the sample means. Use Hoeffding's inequality to bound \[ P\bigl((\bar X-\bar Y)-E[\bar X-\bar Y]\ge \varepsilon\bigr). \]
打开 →A desk has 12 sectors, each containing 5 genuinely null variants. In each sector it keeps only the smallest p-value, and it flags the sector if that winning p-value is below 1%. Assuming independence, what is the probability at least one sector is falsely flagged?
打开 →A coin is flipped 20 times and lands heads 14 times. Use the normal approximation to test fairness at the 5% two-sided level.
打开 →A live manager panel shows 27 low-leverage funds and 18 high-leverage funds. Survival rates for those groups were 90% and 60%, respectively. Suppose low-leverage funds average 1.2x gross leverage and high-leverage funds average 2.4x gross leverage. What was the average gross lev
打开 →A PM deck says: ‘The event-study p-value is 0.03, so there is a 97% probability the signal is real.’ What is the statistical mistake?
打开 →A researcher tests 50 candidate features and only reports the one with the smallest p-value, which happens to be 0.01. Why is it misleading to present 0.01 as if it came from a single pre-specified test?
打开 →A vendor offers two hedge-fund datasets. Dataset A contains only funds that are currently reporting, but it includes long backfilled histories for those funds. Dataset B stores monthly reporting snapshots and preserves closed funds in the historical archive. Which dataset is be
打开 →Five family-level winners have ordered p-values 0.004, 0.011, 0.018, 0.031, and 0.070. For what range of BH target levels q would Benjamini-Hochberg keep exactly the first three discoveries?
打开 →A research grid contains 60 model variants, but the desk argues they amount to only 15 effectively distinct families. If it wants family-wise error at most 10% using a Bonferroni family-level rule, what p-value cutoff should it apply to each effective family?
打开 →Suppose 50 genuinely null standardized t-statistics are approximately independent N(0,1). What is the probability the largest of them exceeds 2.4?
打开 →A note tests 12 strategy diagnostics but highlights only the one with p = 0.02. What trap should the reviewer flag?
打开 →A six-sided die is rolled 60 times, and the counts are $(14,8,11,9,10,8)$. Test fairness using a chi-square goodness-of-fit test.
打开 →Assume normal data with $n=20$ and sample variance $s^2=9$. Test $H_0:\sigma^2=4$ using the classical chi-square variance test.
打开 →A 2x2 contingency table of outcomes is $$\begin{pmatrix}30 & 20\\ 10 & 40\end{pmatrix}.$$ Test independence using the classical chi-square test.
打开 →A researcher wants the 2016-2025 average return of all funds launched in 2016. Which data pull is least exposed to survivorship bias? A. Keep only funds that are still alive in 2025, then use their full back history. B. Keep funds alive on each evaluation date, but retroactively
打开 →A reviewer writes: ‘p = 0.07 means the null hypothesis is true with probability 7%.’ What is wrong with the conditioning direction?
打开 →A two-sided test for a coefficient gives z = 1.8. Without doing a separate interval calculation, what can you say about whether the 95% confidence interval contains zero?
打开 →A portfolio holds weights 0.4 and 0.6 in two assets with volatilities 0.06 and 0.12 and current correlation -0.4. If correlation normalizes to 0 while weights and volatilities stay fixed, by what percentage does portfolio variance increase?
打开 →Suppose 100 candidate factors collapse into roughly 20 tight clusters of nearly identical signals. What qualitative effect does this have on multiplicity relative to 100 truly independent tests?
打开 →An experiment randomizes by store rather than by customer. If the average cluster size is $m$ and the intra-cluster correlation is $\rho$, what is the standard design-effect multiplier on variance?
打开 →