INTERVIEW PREP

数学与非代码面试题

覆盖数学、概率、统计、脑筋急转弯、机器学习和金融。这里负责筛选和进入单题；编程题使用独立的 LeetCode 式 coding lab。

做诊断按领域练习按面试风格练习代码题库

题目: 4169
领域: 8
当前筛选: 1721

第 58 / 87 页

非代码面试题

显示 20 / 1721 道匹配题目

答题状态：未尝试未正确已正确

ID题目领域难度题型进度权限

4302Label Smoothing Loss 2A network uses stochastic depth on 12 residual blocks, each with survival probability 0.75 during training. How many blocks are active on average in a training pass?机器学习中等数值题未尝试面试订阅 4303Label Smoothing Loss 3A layer has 400 weights. DropConnect keeps each weight independently with probability 0.9 during training. How many weights are active on average in one forward pass?机器学习中等数值题未尝试面试订阅 4304Label Smoothing Loss 4A unit has activation a = 3 before inverted dropout with keep probability q = 0.75. During training the output is either 0 or a/q. What is the variance of that post-dropout output?机器学习中等数值题未尝试面试订阅 4316Attention Score CountA Transformer layer processes L=256 tokens with H=8 heads. Ignoring the value dimension, how many raw attention score entries are formed across all heads?机器学习简单数值题未尝试面试订阅 4317Stacked CNN Receptive FieldA 1D CNN stacks 6 causal layers with kernel size 3, stride 1, and no dilation. What is the receptive field in tokens?机器学习简单数值题未尝试面试订阅 4318Dilated CNN HorizonA causal CNN uses 4 layers with kernel size 3 and dilations 1, 2, 4, and 8. What dependency horizon can one output token directly aggregate?机器学习简单数值题未尝试面试订阅 4319Sequential Depth ComparisonFor a length-512 sequence, how many sequential processing steps must a vanilla RNN execute, and how many sequential token-wise steps does a standard full-sequence Transformer need at inference once the whole block is available?机器学习简单数值题未尝试面试订阅 4320Attention Memory FootprintA full-attention model uses L=1024 tokens and stores one attention score matrix per head in float16. Roughly how much memory does one head's score matrix use?机器学习简单数值题未尝试面试订阅 4341Confusion-Matrix Metrics 1At a fixed threshold, prevalence is 20%, TPR is 80%, and FPR is 10%. What precision does that imply?机器学习简单数值题未尝试面试订阅 4342Confusion-Matrix Metrics 2A fraud model keeps TPR = 0.90 and FPR = 0.03 when deployed into a market where prevalence falls from 10% to 2%. What precision should you now expect at the same threshold?机器学习简单数值题未尝试面试订阅 4343Confusion-Matrix Metrics 3Predicted probabilities are [0.8, 0.6, 0.3, 0.1] and labels are [1, 0, 1, 0]. What is the Brier score?机器学习简单数值题未尝试面试订阅 4344Confusion-Matrix Metrics 4For expected calibration error with equal sample weighting, you have two nonempty bins. Bin A has probabilities [0.2, 0.3] and labels [0, 1]. Bin B has probabilities [0.8, 0.9] and labels [1, 1]. Using ECE = sum over bins of (bin fraction)*|avg confidence - accuracy|, what ECE do you get?机器学习简单数值题未尝试面试订阅 4345Confusion-Matrix Metrics 5A model assigns an average predicted probability of 0.18 to a bucket containing 200 names. If the model is calibrated, how many positives should you expect in that bucket on average?机器学习简单数值题未尝试面试订阅 4346Brier Score Snapshot 1At one threshold, prevalence is 5%, TPR is 80%, and FPR is 10%. What PR-space point (recall, precision) corresponds to that ROC-space operating point?机器学习中等数值题未尝试面试订阅 4347Brier Score Snapshot 2Across 500 cases, a calibrated model has mean predicted probability 0.12. How many positives should you expect in total?机器学习中等数值题未尝试面试订阅 4348Brier Score Snapshot 3On a validation set, the model's mean predicted probability is 9% but the observed positive rate is 6%. What calibration-in-the-large error does that imply?机器学习中等数值题未尝试面试订阅 4349Brier Score Snapshot 4A thresholding rule is used on a universe where prevalence is 10%, TPR is 70%, and FPR is 5%. A false negative costs 4 units and a false positive costs 1 unit. What is the expected misclassification cost per case?机器学习中等数值题未尝试面试订阅 4350Brier Score Snapshot 5A calibrated bucket contains 80 names with average predicted probability 0.35. If you actually observe 20 positives, what empirical positive rate does that bucket realize, and by how many percentage points is it under the average prediction?机器学习中等数值题未尝试面试订阅 4351Asymmetric Threshold Choice 1Three candidate thresholds on the same classifier yield t=0.3 -> FP=18, FN=4; t=0.5 -> FP=9, FN=7; t=0.7 -> FP=4, FN=14. If one false negative costs 5 units and one false positive costs 1 unit(s), which threshold minimizes expected classification cost over this sample?机器学习中等数值题未尝试面试订阅 4366Nested CV Fit Count 1Three model sizes have mean CV AUCs 0.790, 0.802, and 0.808. The standard error of the best score is 0.010. Under the one-standard-error rule, which is the simplest model you would keep?机器学习简单数值题未尝试面试订阅