标签:max time set min crete def inf close mode
gym的调用遵从以下的顺序
例程是一个简单的策略,杆左斜车左移,右斜则右移。
import gym
import numpy as np
env = gym.make('CartPole-v0')
t_all = []
action_bef = 0
for i_episode in range(5):
observation = env.reset()
for t in range(100):
env.render()
cp, cv, pa, pv = observation
if abs(pa)<= 0.1:
action = 1 -action_bef
elif pa >= 0:
action = 1
elif pa <= 0:
action = 0
observation, reward, done, info = env.step(action)
action_bef = action
if done:
# print("Episode finished after {} timesteps".format(t+1))
t_all.append(t)
break
if t ==99:
t_all.append(0)
env.close()
print(t_all)
print(np.mean(t_all))
一个完整的gym环境包括以下函数:类构建、初始化、
vel = np.clip(vel, vel_min, vel_max)
action输入校验
self.action_space.contains(action)
action和observation空间定义
Discrete: 0,1,2
low = np.array([min_0,min_1],dtype=np.float32)
high = np.array([max_0,max_1],dtype=np.float32)
self.action_space = spaces.Discrete(3)
self.observation_space = spaces.Box(
self.low, self.high, dtype=np.float32)
标签:max time set min crete def inf close mode
原文地址:https://www.cnblogs.com/tolshao/p/gym-da-jian-rl-huan-jing.html