INTERVIEW PREP

数学与非代码面试题

覆盖数学、概率、统计、脑筋急转弯、机器学习和金融。这里负责筛选和进入单题;编程题使用独立的 LeetCode 式 coding lab。

题目
4169
领域
8
当前筛选
811

31 / 41

非代码面试题

显示 20 / 811 道匹配题目

答题状态:未尝试未正确已正确
4983Infer Return Rate From Expected Hitting Time 17A CTMC has states 0,1,2 with state 2 absorbing. From 0 it jumps to 1 at rate a=0.5. From 1 it jumps to 2 at rate b=1 and back to 0 at rate c. If the expected time to hit 2 from 0 is 4, what is c?随机过程困难数值题未尝试面试订阅4984Short-Horizon Probability of Landing in a SubsetA CTMC starts in state i. The jump rates from i to j, k, l are 0.7, 0.2, and 0.1. Using a first-order approximation over Delta t = 0.2, what is the probability that X Delta t lies in the subset j,k ?随机过程困难数值题未尝试面试订阅4985Infer Return Rate From Expected Hitting Time 19A CTMC has states 0,1,2 with state 2 absorbing. From 0 to 1 the rate is a=1.5, and from 1 to 2 the rate is b=0.5. If the expected hitting time of state 2 from 0 is 3, what is the return rate c from 1 back to 0?随机过程困难数值题未尝试面试订阅4986Why Fixed Waiting Times Break the CTMC PropertyA simulator gets the jump-chain routing probabilities right but replaces exponential waits by deterministic one-minute waits in every state. Why is the resulting calendar-time process generally not a CTMC anymore?随机过程困难essay未尝试面试订阅4987Why Uniformization Can Use One Poisson ClockWhy can a CTMC with different exit rates across states still be simulated using one common Poisson clock plus occasional virtual self-jumps?随机过程困难essay未尝试面试订阅4988Stationary Does Not Mean SlowWhy can a state have a small stationary probability even if it has a large exit rate?随机过程困难essay未尝试面试订阅4989Same Jump Chain, Different Calendar-Time BehaviorWhy can two jump processes share exactly the same jump chain but still look very different when observed in real time?随机过程困难essay未尝试面试订阅4990Why First-Step WorksWhy is first-step analysis so effective for expected hitting times in jump processes?随机过程困难essay未尝试面试订阅5066Infer Self-Transition Probability From a Bellman Value 1Under a fixed policy, state s yields immediate reward 1 each step. With probability p it returns to s next step; otherwise the episode ends. If the discount factor is 0.9 and the state value is reported as V(s)=2.5, what p is implied?机器学习简单数值题未尝试面试订阅5067Infer Self-Transition Probability From a Bellman Value 2Under a fixed policy, state s yields immediate reward 0.5 each step. With probability p it returns to s next step; otherwise the episode ends. If the discount factor is 0.95 and the state value is reported as V(s)=2, what p is implied?机器学习简单数值题未尝试面试订阅5068Infer Self-Transition Probability From a Bellman Value 3Under a fixed policy, state s yields immediate reward 2 each step. With probability p it returns to s next step; otherwise the episode ends. If the discount factor is 0.8 and the state value is reported as V(s)=4, what p is implied?机器学习简单数值题未尝试面试订阅5069Infer Self-Transition Probability From a Bellman Value 4Under a fixed policy, state s yields immediate reward 1.2 each step. With probability p it returns to s next step; otherwise the episode ends. If the discount factor is 0.85 and the state value is reported as V(s)=2.4, what p is implied?机器学习简单数值题未尝试面试订阅5071Recover Bootstrapped Target From a Q-Learning Update 6A tabular Q-learning step starts from old Q=0.2, uses learning rate alpha=1, reward 0.5, and discount gamma=0.9. After the update the Q-value becomes 2.9. What max a' Q(s',a') must the learner have used?机器学习简单数值题未尝试面试订阅5072Recover Bootstrapped Target From a Q-Learning Update 7A tabular Q-learning step starts from old Q=1.1, uses learning rate alpha=0.5, reward 0.2, and discount gamma=0.8. After the update the Q-value becomes 1.6. What max a' Q(s',a') must the learner have used?机器学习简单数值题未尝试面试订阅5073Recover Bootstrapped Target From a Q-Learning Update 8A tabular Q-learning step starts from old Q=-0.4, uses learning rate alpha=0.25, reward 1, and discount gamma=0.95. After the update the Q-value becomes 1.2. What max a' Q(s',a') must the learner have used?机器学习简单数值题未尝试面试订阅5074Recover Bootstrapped Target From a Q-Learning Update 9A tabular Q-learning step starts from old Q=0.7, uses learning rate alpha=0.4, reward 0.3, and discount gamma=0.9. After the update the Q-value becomes 2. What max a' Q(s',a') must the learner have used?机器学习简单数值题未尝试面试订阅5075Recover Bootstrapped Target From a Q-Learning Update 10A tabular Q-learning step starts from old Q=0, uses learning rate alpha=0.5, reward 0.1, and discount gamma=0.99. After the update the Q-value becomes 3. What max a' Q(s',a') must the learner have used?机器学习简单数值题未尝试面试订阅5086RL Training Diagnostic 21Why can bootstrapping help value estimates even before an episode terminates?机器学习困难essay未尝试面试订阅5087RL Training Diagnostic 22Why does an RL agent usually need explicit exploration even if its current greedy action already looks good?机器学习困难essay未尝试面试订阅5088Discount Factor IntuitionWhy does increasing the discount factor often make value estimates more sensitive to long-run model misspecification?机器学习困难essay未尝试面试订阅