5079机器学习中等数值题short
Choose the Greedy Backup Action 14
题目
In one state, action 1 gives immediate reward 0.8 and then moves to states of value 6 with probability 0.2 and 1 otherwise. Action 2 gives immediate reward 0.5 and then moves to states of value 2 with probability 0.5 and 3 otherwise. If gamma=0.75, which action is greedy and what backup value does it produce?
解题计时
0:00
提交作答时记录,用于后续平均用时统计。
你的答案
greedy_action
backup_value