第 42 / 209 页
非代码面试题
显示 20 / 4169 道匹配题目
答题状态:未尝试未正确已正确
ID题目领域难度题型进度权限
1714Optional Stopping as a p-Hacking MechanismA PM checks the p-value every hour and stops the experiment as soon as p < 0.05. Why does this inflate false positives?统计中等essay未尝试面试订阅1715The Prosecutor's Fallacy in a Trading ContextA rare anomaly occurs in only 1 out of 10,000 normal days. A model flags today's pattern as one that would happen with probability 1/10,000 under the null, and someone concludes the null must almost certainly be false. What key base-rate issue are they missing?统计中等essay未尝试面试订阅1716Replication Probability Is Not 1 - pA researcher says: ‘This alpha had p = 0.04, so replication next quarter is 96% likely.’ Why is that interpretation invalid?统计简单essay未尝试免费1717Significant but Commercially UncertainA venue-routing tweak delivers p = 0.01, but the 95% confidence interval for annual savings is [10k, 1.2m]. Why should the team still be cautious?统计简单essay未尝试免费1718Best-of-50 Reporting Without AdjustmentA researcher tests 50 candidate features and only reports the one with the smallest p-value, which happens to be 0.01. Why is it misleading to present 0.01 as if it came from a single pre-specified test?统计中等essay未尝试面试订阅17190.049 vs 0.051 Decision CliffTwo backtests differ only slightly: one reports p = 0.049 and the other p = 0.051. Why is it bad practice to call one ‘real’ and the other ‘not real’ purely because one is below 0.05?统计简单essay未尝试免费1720Low Prior Probability and Positive Predictive ValueSuppose only 1% of tested trading ideas are genuinely predictive. A testing pipeline has 80% power and a 5% false-positive rate. Conditional on obtaining a positive result, what fraction of positives are truly real?统计中等derivation未尝试面试订阅1721Switching to One-Sided After Seeing the SignA two-sided test is not significant, but the estimated coefficient has the expected sign. The analyst then reports the one-sided p-value instead. Why is that invalid if the direction was chosen after looking at the data?统计中等essay未尝试面试订阅1722Cherry-Picked EndpointA note tests 12 strategy diagnostics but highlights only the one with p = 0.02. What trap should the reviewer flag?统计简单essay未尝试免费1723Underpowered Winner’s CurseA sparse signal library was screened on short samples; the surviving signal is significant but came from a very low-power environment. Why should its in-sample effect size be viewed skeptically?统计中等essay未尝试面试订阅1724Conditioning Direction ErrorA reviewer writes: ‘p = 0.07 means the null hypothesis is true with probability 7%.’ What is wrong with the conditioning direction?统计简单essay未尝试免费1725Fail to Reject vs AcceptAn experiment does not reject the null at 5%. The team writes ‘the null is accepted.’ What is the correct correction?统计简单essay未尝试免费1726Design Effect Under Clustered AssignmentAn experiment randomizes by store rather than by customer. If the average cluster size is m and the intra-cluster correlation is , what is the standard design-effect multiplier on variance?统计中等derivation未尝试面试订阅1727Effective Sample Size After a CUPED-Style Variance ReductionA variance-reduction method shrinks the variance of the treatment-effect estimator by a factor of c where 0<c<1. By what factor does the effective sample size increase?统计中等derivation未尝试面试订阅1728Triggered Analysis Exposure FractionOnly a fraction q of randomized users ever encounter the feature being tested. If the treatment effect exists only on those triggered users, how does the intent-to-treat effect compare with the triggered-user effect?统计中等derivation未尝试面试订阅1729Why Session-Level Randomization Can Leak Across UsersWhy can session-level randomization be misleading when the same user returns many times and behavior carries over across sessions?统计简单essay未尝试免费1730Why Daily Peeking Breaks a Fixed-Horizon ThresholdWhy does checking significance every day against an unadjusted fixed-horizon p-value threshold inflate false positives?统计中等essay未尝试面试订阅1731Why Triggered Analysis Can Beat a Naive Population AverageWhy can triggered analysis produce a cleaner estimate than averaging over all randomized users?统计简单essay未尝试免费1732Why Interference Can Ruin User-Level RandomizationWhy can an experiment on social or marketplace features violate the usual randomized-test logic even if assignment itself was truly random?统计简单essay未尝试免费1733Why Guardrail Metrics Matter Even When the Primary Metric WinsWhy is a primary-metric win not enough to ship an experiment if latency, complaints, or cancellation rates deteriorate?统计中等essay未尝试面试订阅