标签:imp square node and oss ast com eps image
^ is the square root of epsilon
a simplified version of hard version
a more smooth way to find correct solution
the first term is the REINFORCE term, and the seconde term is our grad log probability of our loss
b is a stochastic node
more formula derivations are ignored.
Deep RL Bootcamp Lecture 7: SVG, DDPG, and Stochastic Computation Graphs
标签:imp square node and oss ast com eps image