← 返回数学题库
5087机器学习困难essaymedium

RL Training Diagnostic 22

题目

Why does an RL agent usually need explicit exploration even if its current greedy action already looks good?

解题计时

0:00

提交作答时记录,用于后续平均用时统计。