标签:policy 化学 ons state 强化学习 分享 promise nbsp image
make compromise between learnt policy and minimal cost!
π hat is using states
π theta is using observations
CS294-112 深度强化学习 秋季学期(伯克利)NO.9 Learning policies by imitating optimal controllers
标签:policy 化学 ons state 强化学习 分享 promise nbsp image
原文地址:https://www.cnblogs.com/ecoflex/p/9097988.html