标签:
本篇是sample.py
1 # -*- coding: utf-8 -*- 2 """Contains class representing an LSPI sample.""" 3 4 5 class Sample(object): 6 7 """Represents an LSPI sample tuple ``(s, a, r, s‘, absorb)``. 8 #表达了LSPI的采样,用tuple表示 9 Parameters#输入参数 10 ---------- 11 12 state : numpy.array#状态向量 13 State of the environment at the start of the sample.采样开始时环境的状态 14 ``s`` in the sample tuple. 15 (The usual type is a numpy array.) 16 action : int#执行的动作的编号 17 Index of action that was executed. 18 ``a`` in the sample tuple 19 reward : float#从环境中获得的奖励 20 Reward received from the environment. 21 ``r`` in the sample tuple 22 next_state : numpy.array#采用了采样中的动作后的下一个环境状态 23 State of the environment after executing the sample‘s action. 24 ``s‘`` in the sample tuple 25 (The type should match that of state.) 26 absorb : bool, optional#如果这个采样终结了这个episode那么就返回True 27 True if this sample ended the episode. False otherwise. 28 ``absorb`` in the sample tuple 29 (The default is False, which implies that this is a 30 non-episode-ending sample) 31 32 33 Assumes that this is a non-absorbing sample (as the vast majority 34 of samples will be non-absorbing). 35 #假设这个sample是不会结束episode的, 36 #这么做:设成一个类,是为了方便不同的调用方式 37 This class is just a dumb data holder so the types of the different 38 fields can be anything convenient for the problem domain. 39 40 For states represented by vectors a numpy array works well. 41 42 """ 43 44 def __init__(self, state, action, reward, next_state, absorb=False):#初始化 45 """Initialize Sample instance.""" 46 self.state = state 47 self.action = action 48 self.reward = reward 49 self.next_state = next_state 50 self.absorb = absorb 51 52 def __repr__(self):#打印的时候调用该函数. 53 """Create string representation of tuple.""" 54 return ‘Sample(%s, %s, %s, %s, %s)‘ % (self.state, 55 self.action, 56 self.reward, 57 self.next_state, 58 self.absorb)
(六)Value Function Approximation-LSPI code (5)
标签:
原文地址:http://www.cnblogs.com/lijiajun/p/5490109.html