Learning from delayed reward (Q-Learning的提出) （Watkins博士毕业论文）（建立了现在的reinforcement Learning模型）

时间：2019-01-11 21:17:20 阅读：461 评论：0 收藏：0 [点我收藏+]

标签：href 集成 ram www dynamic intro png war img

最近在在学习强化学习方面的东西，对于现有的很多文章中关于强化学习的知识很是不理解，很多都是一个公式套一个公式，也没有什么太多的解释，感觉像是在看天书一般，经过了较长时间的挣扎最后决定从一些基础的东西开始入手，于是便有了这篇论文的发现。

Learning from Delayed Reward

该论文的页面为： http://www.cs.rhul.ac.uk/~chrisw/thesis.html

下载地址为： http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf

论文页面对这篇文章的描述：

The thesis introduces the notion of reinforcement learning as learning to control a Markov Decision Process by incremental dynamic programming,

and describes a range of algorithms for doing this, including Q-learning, for which a sketch of a proof of convergence is given.

这篇文章虽然在现有的很多文献中并不是很被提及，但是它却具有很大的意义。这篇文章（准确的说是作者在1987年发表的一篇会议论文，集成在了这篇学位论文中了）建立了现在意义上的强化学习模型，它第一次将trial-and-error 和 dynammic programming 和 temporal diffecrence 结合在了一起，并提出了Q-Learning算法。在某种意义上它可谓是“万恶之源”。

=====================================================

技术分享图片

============================================================

文章目录：

技术分享图片

Learning from delayed reward (Q-Learning的提出) （Watkins博士毕业论文）（建立了现在的reinforcement Learning模型）

标签：href 集成 ram www dynamic intro png war img

原文地址：https://www.cnblogs.com/devilmaycry812839668/p/10257303.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行