← 返回数学题库
5066机器学习简单数值题short

Infer Self-Transition Probability From a Bellman Value 1

题目

Under a fixed policy, state s yields immediate reward 1 each step. With probability p it returns to s next step; otherwise the episode ends. If the discount factor is 0.9 and the state value is reported as V(s)=2.5, what p is implied?

解题计时

0:00

提交作答时记录,用于后续平均用时统计。

你的答案