(六)Value Function Approximation-LSPI code (5)

时间：2016-05-13 19:06:57 阅读：308 评论：0 收藏：0 [点我收藏+]

标签：

本篇是sample.py

 1 # -*- coding: utf-8 -*-
 2 """Contains class representing an LSPI sample."""
 3 
 4 
 5 class Sample(object):
 6 
 7     """Represents an LSPI sample tuple ``(s, a, r, s‘, absorb)``.
 8     #表达了ＬＳＰＩ的采样，用ｔｕｐｌｅ表示
 9     Parameters＃输入参数    
10     ----------
11         
12     state : numpy.array＃状态向量
13         State of the environment at the start of the sample.采样开始时环境的状态
14         ``s`` in the sample tuple.
15         (The usual type is a numpy array.)
16     action : int＃执行的动作的编号
17         Index of action that was executed.
18         ``a`` in the sample tuple
19     reward : float＃从环境中获得的奖励
20         Reward received from the environment.
21         ``r`` in the sample tuple
22     next_state : numpy.array＃采用了采样中的动作后的下一个环境状态
23         State of the environment after executing the sample‘s action.
24         ``s‘`` in the sample tuple
25         (The type should match that of state.)
26     absorb : bool, optional＃如果这个采样终结了这个episode那么就返回Ｔｒｕｅ
27         True if this sample ended the episode. False otherwise.
28         ``absorb`` in the sample tuple
29         (The default is False, which implies that this is a
30         non-episode-ending sample)
31 
32 
33     Assumes that this is a non-absorbing sample (as the vast majority
34     of samples will be non-absorbing).
35     ＃假设这个ｓａｍｐｌｅ是不会结束episode的，
36     ＃这么做：设成一个类，是为了方便不同的调用方式
37     This class is just a dumb data holder so the types of the different
38     fields can be anything convenient for the problem domain.
39 
40     For states represented by vectors a numpy array works well.
41 
42     """
43 
44     def __init__(self, state, action, reward, next_state, absorb=False):＃初始化
45         """Initialize Sample instance."""
46         self.state = state
47         self.action = action
48         self.reward = reward
49         self.next_state = next_state
50         self.absorb = absorb
51 
52     def __repr__(self):＃打印的时候调用该函数．
53         """Create string representation of tuple."""
54         return ‘Sample(%s, %s, %s, %s, %s)‘ % (self.state,
55                                                self.action,
56                                                self.reward,
57                                                self.next_state,
58                                                self.absorb)

标签：

原文地址：http://www.cnblogs.com/lijiajun/p/5490109.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行