Optimal Value Functions and Optimal Policy

时间：2019-07-10 10:37:44 阅读：69 评论：0 收藏：0 [点我收藏+]

标签：each therefore rev event ini rom follow may related

Optimal Value Function is how much reward the best policy can get from a state s, which is the best senario given state s. It can be defined as:

技术图片

For example, in the student study case, the value function for the blue circle state under 50:50 policy is 7.4.

技术图片

However,when we consider the Optimal State-Value function, the ‘branches‘ that may prevent us from getting the best scores are commonly ignored. For instance, the optimal senario for the blue circle state is having 100% probability to continue his study rather than going to pub.

技术图片

Then we move to Action-Value Function, and the following equation also reveals the Optimal Action-Value Function is from the policy who gives the best Action Returns.

技术图片

The Optimal Action-Value Function is strongly related to Optimal State-Value Function by:

技术图片

The equation means when action a is taken at state s, what the best return is. At this condition, the probability of reaching each state and the immediate reward is determined, so the only variable is the State-Value function. Therefore it is obvious that obtaining the Optimal State-Value function is equivalent to holding the Optimal Action-Value Function. Conversely, the best State-Value function means a policy with best actions leading the agent getting the most rewards from each state.

技术图片

Still in the student example, when we know the Optimal State-Value Function, the Optimal Action-Value Function can be calculated as:

技术图片

Finally we can derive the best policy from the Optimal Action-Value Function:

技术图片

This means the policy only picks up the best action at every state rather than having a probability distribution. This deterministic policy is the goal of Reinforcement Learning, as it will guide the action to complete the task.

Optimal Value Functions and Optimal Policy

标签：each therefore rev event ini rom follow may related

原文地址：https://www.cnblogs.com/rhyswang/p/11155907.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行