Learning Notes: Morvan - Reinforcement Learning, Part 4: Deep Q Network

时间：2017-01-18 08:00:21 阅读：406 评论：0 收藏：0 [点我收藏+]

标签：.com img nts image dom put 分享 who lap

Deep Q Network

Notes

Deep Q-learning Algorithm

This gives us the final deep Q-learning algorithm with experience replay:

技术分享

There are many more tricks that DeepMind used to actually make it work – like target network, error clipping, reward clipping etc, but these are out of scope for this introduction.

The most amazing part of this algorithm is that it learns anything at all. Just think about it – because our Q-function is initialized randomly, it initially outputs complete garbage. And we are using this garbage (the maximum Q-value of the next state) as targets for the network, only occasionally folding in a tiny reward. That sounds insane, how could it learn anything meaningful at all? The fact is, that it does.

Extension

Using Keras and Deep Q-Network to Play FlappyBird | Ben Lau
Demystifying Deep Reinforcement Learning
- The above post is a must-read for those who are interested in deep reinforcement learning.

Learning Notes: Morvan - Reinforcement Learning, Part 4: Deep Q Network

标签：.com img nts image dom put 分享 who lap

原文地址：http://www.cnblogs.com/casperwin/p/6295404.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行