全站搜索 — 锐望实验室

全部 · 4546 课程 · 299 模块 · 72 题目 · 4169 帮助 · 6 收藏题单 · 0

找到 30 个结果

题目2457 · 机器学习

PCA Fit Once Before Cross-Validation

A notebook computes PCA on the full feature matrix and then feeds the resulting components into every cross-validation fold. Why is that not a harmless speed optimization?

打开 →

题目2653 · 机器学习

Training Observations Per Fold in Grouped Cross-Validation

There are 12 issuers and each issuer contributes 5 observations. In 3-fold grouped cross-validation, one fold holds out 4 issuers at a time. How many observations are used for training in each fold?

打开 →

题目5041 · 机器学习

Recover the Missing Fold Gap 1

A 5-fold cross-validation comparison records four paired score differences (model A minus model B): [0.02, 0.01, -0.01, 0.03]. The desk report says the overall mean fold difference across all 5 folds was 0.01. What was the missing fifth-fold difference?

打开 →

题目5042 · 机器学习

Recover the Missing Fold Gap 2

A 5-fold cross-validation comparison records four paired score differences (model A minus model B): [0.05, 0.02, 0.04, -0.01]. The desk report says the overall mean fold difference across all 5 folds was 0.026. What was the missing fifth-fold difference?

打开 →

题目5043 · 机器学习

Recover the Missing Fold Gap 3

A 5-fold cross-validation comparison records four paired score differences (model A minus model B): [-0.02, 0.01, 0.0, -0.01]. The desk report says the overall mean fold difference across all 5 folds was 0.002. What was the missing fifth-fold difference?

打开 →

题目5044 · 机器学习

Recover the Missing Fold Gap 4

A 5-fold cross-validation comparison records four paired score differences (model A minus model B): [0.01, 0.01, 0.02, 0.0]. The desk report says the overall mean fold difference across all 5 folds was 0.014. What was the missing fifth-fold difference?

打开 →

题目5045 · 机器学习

Recover the Missing Fold Gap 5

A 5-fold cross-validation comparison records four paired score differences (model A minus model B): [0.04, -0.02, 0.01, 0.02]. The desk report says the overall mean fold difference across all 5 folds was 0.01. What was the missing fifth-fold difference?

打开 →

题目2667 · 机器学习

Why Class Stratification Is Not Enough for Repeated Entities

Why can class-stratified cross-validation still fail badly when the same issuer appears many times and issuer identity carries predictive information?

打开 →

题目2647 · 机器学习

Why Grouped CV Beats Row-Wise CV for Repeated Entities

Why is row-wise cross-validation inappropriate when each entity appears many times and the model can recognize entity-specific signatures?

打开 →

题目2648 · 机器学习

Why Random k-Fold Is Invalid for Overlapping Rolling Features

Why can random k-fold cross-validation be invalid when each feature vector uses a rolling 20-day history from a time series?

打开 →

题目2656 · 机器学习

Why Random Row CV Breaks With Overlapping Label Horizons

Why can ordinary random row cross-validation severely overstate performance when each label depends on the next 5 trading days and adjacent rows overlap in those horizons?

打开 →

题目4419 · 机器学习

Average Training Length Under Expansion 4

An expanding walk-forward starts with 12 months of training and then advances by 6 months for each of 5 complete test folds. What is the average training-window length used across the 5 folds?

打开 →

题目2702 · 机器学习

Chance the Best Null t-Statistic Exceeds 2.4 Across 50 Variants

Suppose 50 genuinely null standardized t-statistics are approximately independent N(0,1). What is the probability the largest of them exceeds 2.4?

打开 →

题目2458 · 机器学习

Choosing Early Stopping by the Test Curve

A team trains one model, plots test loss by boosting round, and reports the round with the best test value. Why is the final test score no longer a valid final check?

打开 →

题目4418 · 机器学习

Embargo Budget Across Folds 3

A walk-forward backtest produces 7 complete folds, and the research protocol inserts a 3-day embargo between each training block and its following test block. How many calendar days are lost to embargo across the whole run?

打开 →

题目4422 · 机器学习

Execution-Lagged Label Capacity 7

A test block has 25 trading days. A signal generated on day t is executed on day t+1 and evaluated on the open-to-close return from day t+1 through day t+4. How many signals inside the block can be scored without the label running past the block end?

打开 →

题目4417 · 机器学习

Expanding Window Final Training Length 2

An expanding-window walk-forward starts with 18 months of training, then uses a 1-month embargo and a 4-month test block, advancing by 4 months each round across 59 months of history. What is the training-window length in the last complete fold?

打开 →

题目2704 · 机器学习

Expected Null Strategies Surviving a Screening Funnel

A research platform runs 200 null strategies. Only strategies with in-sample p-value below 15% are promoted, and each promoted strategy must then pass a fresh 5% confirmation test. Assuming independence under the null, what is the expected number of false strategies that survive

打开 →

题目2654 · 机器学习

Expected Number of Times a Point Is Validated in Repeated k-Fold CV

In R repeats of ordinary k-fold CV, each point appears in exactly one validation fold per repeat. Derive the number of validation appearances of one point across all repeats.

打开 →

题目2696 · 机器学习

False Strategy Surviving Two Independent Research Gates

A desk tries 80 genuinely null strategy ideas. A strategy is kept only if it passes an in-sample screen at 10% and then a fresh out-of-sample confirmation at 5%, with the two tests treated as independent under the null. What is the probability at least one null idea survives both

打开 →

题目2697 · 机器学习

False Winner Probability After Clustering 240 Variants into 24 Families

A researcher generates 240 heavily correlated strategy variants but argues they amount to only 24 effectively independent families. If the desk still flags any family with p-value below 8%, what is the approximate probability of at least one false family-level winner under the nu

打开 →

题目2454 · 机器学习

Feature Screening Before the Split

A team ranks 5,000 candidate features by correlation with the target on the full dataset, keeps the top 30, and only then creates train and test. Why is the later split not enough to rescue the experiment?

打开 →

题目2452 · 机器学习

Future Restatements Merged Into Historical Features

A researcher joins fundamentals after they were restated months later, then backtests on the original trade dates. Why is this a split-discipline failure even if no test labels were touched?

打开 →

题目2468 · 机器学习

Group Leakage Inflates Confidence Too

Why does entity overlap across train and test typically make confidence intervals and model-stability assessments look better than they really are?

打开 →

题目2448 · 机器学习

Held-Out Base Rate Implied by a Full-Sample Class Weight

A training set has 100 labels with 30 positives. A class-weighting routine is mistakenly fit on all 125 labels and reports an overall positive rate of 0.36. What is the positive rate in the 25 held-out labels?

打开 →

题目2446 · 机器学习

Hidden Validation Positives Implied by a Leaky Target Encoder

A category appears 40 times in train with 18 positives and 10 times in validation. A target encoder is incorrectly fit on train plus validation and outputs 0.56 for that category. How many validation positives did the encoder implicitly use?

打开 →

题目2664 · 机器学习

How Many Distinct Hyperparameter Winners Can Outer CV Produce

A nested CV uses 7 outer folds and selects exactly one hyperparameter setting inside each outer fold. What is the maximum possible number of distinct winning hyperparameter settings across outer folds?

打开 →

题目2649 · 机器学习

How Many Expanding-Window Folds Fit in a Monthly Panel?

You have 60 months of data. Each expanding-window fold uses 24 months for training, the next 6 months for validation, and then advances by 6 months. How many validation folds fit?

打开 →

题目2579 · 机器学习

Infer Tree Correlation From the Variance Floor 23

A single tree has variance 6, while an extremely large forest appears to level off at variance 1.8. What pairwise tree correlation rho is implied?

打开 →

题目2449 · 机器学习

Issuer Demeaning That Quietly Uses Held-Out Rows

For one issuer, the three training rows sum to 12. A pipeline mistakenly demeans by the full-sample issuer mean 3.6 computed from five rows total. What is the sum of the two held-out rows for that issuer?

打开 →