首页 > 其他好文 > 详细

CS294-112深度增强学习课程（加州大学伯克利分校 2017）NO.4 Learning policies by imitating optimal controllers

时间：2018-05-23 20:26:52 阅读：163 评论：0 收藏：0 [点我收藏+]

标签：ems optimize ast sch gradient city learning cto some

技术分享图片

技术分享图片

There are some problems: mismatch of model and reality; gradient explosion

技术分享图片

技术分享图片

技术分享图片

so, the dynamics can be quite messy, and backpropogating can be quite problematic.

sudden change in velocity and so on. schochastic system. gradient descent can be tough.

技术分享图片

can we apply this trajectory optimization method to optimize policy?

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

GPS: guided policy search

技术分享图片

in this case, o_t is from the camera and the joint velocity

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

CS294-112深度增强学习课程（加州大学伯克利分校 2017）NO.4 Learning policies by imitating optimal controllers

标签：ems optimize ast sch gradient city learning cto some

原文地址：https://www.cnblogs.com/ecoflex/p/9078801.html

踩

(0)

赞

(0)

举报

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

更多

友情链接

兰亭集智国之画百度统计站长统计阿里云 chrome插件新版天听网

关于我们 - 联系我们 - 留言反馈

© 2014 mamicode.com 版权所有联系我们:gaon5@hotmail.com

迷上了代码！