INTERVIEW PREP

数学与非代码面试题

覆盖数学、概率、统计、脑筋急转弯、机器学习和金融。这里负责筛选和进入单题；编程题使用独立的 LeetCode 式 coding lab。

做诊断按领域练习按面试风格练习代码题库

题目: 4169
领域: 8
当前筛选: 73

第 3 / 4 页

非代码面试题

显示 20 / 73 道匹配题目

答题状态：未尝试未正确已正确

ID题目领域难度题型进度权限

4308Noisy Labels And OverconfidenceA classifier already has good accuracy, but on borderline names it assigns 99% probability too often and the labels are believed to contain small noise. Which regularization change best targets that failure mode?机器学习中等essay未尝试面试订阅 4309Co-Adapted Hidden UnitsTwo hidden layers memorize pairs of co-occurring signals. In-sample metrics look great, but when one signal in the pair shifts slightly out of sample, performance collapses. Which control is most naturally aimed at reducing this co-adaptation?机器学习中等essay未尝试面试订阅 4310Safe Invariance AvailableYou are training on a small image-like signal dataset where small translations and mirror flips preserve the label by construction. The network fits the training set too easily. What regularization lever should move to the front of the queue?机器学习中等essay未尝试面试订阅 4311Before Raising DropoutYou are tempted to raise dropout from 0.2 to 0.6 after one mediocre run. What is the first diagnostic question you should answer before doing that?机器学习中等essay未尝试面试订阅 4312Before Adding AugmentationA teammate proposes aggressive data augmentation as a universal fix. What is the first check you should make before accepting that plan?机器学习中等essay未尝试面试订阅 4313When Weight Decay Starts HurtingPerformance falls as you increase weight decay. Before concluding that regularization is bad, what structural question should you ask about the signal?机器学习中等essay未尝试面试订阅 4314Before Trusting Early StoppingYour validation metric is noisy day to day. Before treating the first local peak as the stopping point, what should you calibrate?机器学习中等essay未尝试面试订阅 4315Regularization Is Not IsolatedIn an overparameterized network, why is it a mistake to discuss regularization strength without also looking at optimizer and data pipeline choices?机器学习中等essay未尝试面试订阅 4316Attention Score CountA Transformer layer processes L=256 tokens with H=8 heads. Ignoring the value dimension, how many raw attention score entries are formed across all heads?机器学习简单数值题未尝试面试订阅 4317Stacked CNN Receptive FieldA 1D CNN stacks 6 causal layers with kernel size 3, stride 1, and no dilation. What is the receptive field in tokens?机器学习简单数值题未尝试面试订阅 4318Dilated CNN HorizonA causal CNN uses 4 layers with kernel size 3 and dilations 1, 2, 4, and 8. What dependency horizon can one output token directly aggregate?机器学习简单数值题未尝试面试订阅 4319Sequential Depth ComparisonFor a length-512 sequence, how many sequential processing steps must a vanilla RNN execute, and how many sequential token-wise steps does a standard full-sequence Transformer need at inference once the whole block is available?机器学习简单数值题未尝试面试订阅 4320Attention Memory FootprintA full-attention model uses L=1024 tokens and stores one attention score matrix per head in float16. Roughly how much memory does one head's score matrix use?机器学习简单数值题未尝试面试订阅 4321Streaming Order-Flow MotifsYou need millisecond-latency prediction from a live order-flow stream. Most of the useful structure comes from local motifs over the most recent 20-40 events, and the model must update online without waiting for a block. Which architecture family should be your first baseline?机器学习中等essay未尝试面试订阅 4322Online Stateful SequenceA model must process an indefinite event stream one tick at a time and maintain a compact evolving hidden state that can be updated without revisiting past inputs. Which architecture family is most naturally aligned with that requirement?机器学习中等essay未尝试面试订阅 4323Long Offline Cross-ReferenceYou are building an offline model over 4000-token documents where answers often depend on matching phrases across distant sections. Latency is less important than capturing those long-range interactions. Which architecture should dominate the shortlist?机器学习中等essay未尝试面试订阅 4324Small Data With Local StationarityYou have limited labeled data, and the target depends on local translation-equivariant patterns in a 2D signal map. Which architecture family usually brings the strongest built-in inductive bias?机器学习中等essay未尝试面试订阅 4325Rare But Crucial Global LinksA sequence problem has mostly local structure, but a small fraction of labels flips because of interactions between positions hundreds of steps apart. Missing those interactions is very costly. Which architecture family should you favor?机器学习中等essay未尝试面试订阅 4326Length-Doubling Cost ShockA local CNN with window size 7 scales like 7L interactions, while a Transformer attention block scales like L 2 score pairs. If L doubles from 256 to 512, by what factor does each interaction count grow, and which architecture hits the sharper scaling wall?机器学习中等essay未尝试面试订阅 4327CNN Depth For Longer HorizonA stride-1 CNN uses kernel size 3 and no dilation. To cover a dependency horizon of 9 steps you need 4 layers. If the required horizon rises to 41 steps, how many layers are needed, and what does that imply about the architecture pressure?机器学习中等essay未尝试面试订阅