5069机器学习简单数值题short
Infer Self-Transition Probability From a Bellman Value 4
题目
Under a fixed policy, state s yields immediate reward 1.2 each step. With probability p it returns to s next step; otherwise the episode ends. If the discount factor is 0.85 and the state value is reported as V(s)=2.4, what p is implied?
解题计时
0:00
提交作答时记录,用于后续平均用时统计。
你的答案