首页 > 其他好文 > 详细

CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning

时间：2018-05-26 21:26:50 阅读：242 评论：0 收藏：0 [点我收藏+]

标签：9.png nat policy sample trouble other bubuko led ural

技术分享图片

--------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------

技术分享图片

技术分享图片

技术分享图片

understand that correlated samples cause problem. and how paralled solve the problem

another solution is replay buffers, fully ultilizing the advantage of off policy in Q-learning.

技术分享图片

技术分享图片

技术分享图片

技术分享图片

there‘s still a problem: Q learning is not gradient descent

技术分享图片

技术分享图片

divide Q function into two parts: the target net and the evolving net.

sacrifice speed to get the convergence.

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

overestimation of Natural DQN

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

get trouble in left and right dilemma of avoiding bumping on a tree

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning

标签：9.png nat policy sample trouble other bubuko led ural

原文地址：https://www.cnblogs.com/ecoflex/p/9094123.html

踩

(0)

赞

(0)

举报

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

更多

友情链接

兰亭集智国之画百度统计站长统计阿里云 chrome插件新版天听网

关于我们 - 联系我们 - 留言反馈

© 2014 mamicode.com 版权所有联系我们:gaon5@hotmail.com

迷上了代码！