标签:定时 double bsp info 图片 force learn 替换 swa
开山之作: 《Playing Atari with Deep Reinforcement Learning》(NIPS)
http://export.arxiv.org/pdf/1312.5602
《Human-level control through deep reinforcementlearnin》 https://www.cs.swarthmore.edu/~meeden/cs63/s15/nature15b.pdf
使用2个网络,减少了相关性,每隔一定时间,替换参数。
《Deep Reinforcement Learning with Double Q-learning》 https://arxiv.org/pdf/1509.06461.pdf
标签:定时 double bsp info 图片 force learn 替换 swa
原文地址:https://www.cnblogs.com/zle1992/p/10287200.html