← 返回数学题库
5076机器学习中等数值题short

Choose the Greedy Backup Action 11

题目

In one state, action 1 gives immediate reward 0.6 and then moves to states of value 3 with probability 0.4 and 1 otherwise. Action 2 gives immediate reward 0.9 and then moves to states of value 0.2 with probability 0.1 and 2 otherwise. If gamma=0.9, which action is greedy and what backup value does it produce?

解题计时

0:00

提交作答时记录,用于后续平均用时统计。

你的答案

Greedy Action

Backup Value