标签:agent auto block 良好的 microsoft work cores 优先 for
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布!
ICML, pp.507-517, (2020)
Abstract
在过去的十年中,Atari游戏一直是RL社区的长期基准。此基准被提出以测试RL算法的通用能力。先前的工作在该系列的许多游戏中表现出色,但在一些最具挑战性的游戏中却表现很差,因此取得了良好的平均性能。我们提出使用Agent57,这是第一个在所有57种Atari游戏中都优于标准人为基准的深度RL智能体。为了达到这个结果,我们训练了一个神经网络,该网络对一系列策略进行参数化,范围从非常探索到纯粹开发。我们提出一种自适应机制,以选择在整个训练过程中优先考虑的策略。此外,我们利用了结构的新颖参数化,可以实现更一致且更稳定的学习。
1. Introduction
2. Background: Never Give Up (NGU)
3. Improvements to NGU
3.1. State-Action Value Function Parameterization
3.2. Adaptive Exploration over a Family of Policies
4. Experiments
4.1. Summary of the Results
4.2. State-Action Value Function Parameterization
4.3. Backprop Through Time Window Size
4.4. Adaptive Exploration
5. Conclusions
A. Background on MDP
B. Extrinsic-Intrinsic Decomposition
C. Retrace and Transformed Retrace
C.1. Extrinsic-Intrinsic Decomposition for Retrace and Transformed Retrace
C.2. Retrace and Transformed Retrace Losses for Neural Nets
D. Multi-arm Bandit Formalism
E. Implementation details of the distributed setting
Replay buffer:
Actors:
Evaluator:
Learner:
Computation used:
F. Network Architectures
G. Hyperparameters
G.1. Values of β and γ
G.2. Atari pre-processing hyperparameters
G.3. Hyperparameters Used
G.4. Hyperparameters Search Range
H. Experimental Results
H.1. Atari 10: Table of Scores for the Ablations
H.2. Backprop window length comparison
H.3. Identity versus h-transform mixes comparison
H.4. Atari 57 Table of Scores
H.5. Atari 57 Learning Curves
H.6. Videos
Agent57: Outperforming the Atari Human Benchmark
标签:agent auto block 良好的 microsoft work cores 优先 for
原文地址:https://www.cnblogs.com/lucifer1997/p/14352637.html