标签:buying map slides dac trade rod pos iat mbr
The RL problem
Trading as an RL problem
Mapping trading to RL
Markov decision problems
Unknown transitions and rewards
What to optimize?
Learning Procedure
Update Rule
The formula for computing Q
for any state-action pair <s, a>
, given an experience tuple <s, a, s‘, r>
, is:Q‘[s, a] = (1 - α) · Q[s, a] + α · (r + γ · Q[s‘, argmaxa‘(Q[s‘, a‘])])
Here:
r = R[s, a]
is the immediate reward for taking action a
in state s
,γ ∈ [0, 1]
(gamma) is the discount factor used to progressively reduce the value of future rewards,s‘
is the resulting next state,argmaxa‘(Q[s‘, a‘])
is the action that maximizes the Q-value among all possible actions a‘
from s‘
, and,α ∈ [0, 1]
(alpha) is the learning rate used to vary the weight given to new experiences compared with past Q-values.
Two Finer Points
The Trading Problem: Actions
A reward at each step allows the learning agent get feedback on each individual action it takes (including doing nothing).
SMA: single moving average => different stocks have different basis
=> adj close / SMA is a good normalized factor
Creating the State
Discretizing
Q-Learning Recap
T(s, a, s‘)
or rewards R(s, a)
.maxa Q(s, a)
) as well as the best policy in terms of the action that should be taken (argmaxa Q(s, a)
).
Dyna-Q Big Picture <= invented by Richard Sutton
Learning T
How to Evaluate T?
Type in your expression using MathQuill - a WYSIWYG math renderer that understands LaTeX.
E.g.:
Tc
, type: T_c
Σ
, type: \Sigma
For entering a fraction, simply type /
and MathQuill will automatically format it. Try it out!
Correction: The expression should be:
In the denominator shown in the video, T
is missing the subscript c
.
Learning R
Dyna Q Recap
The Dyna architecture consists of a combination of:
Sutton and Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. [web]
Tammer Kamel is the founder and CEO of Quandl - a data platform that makes financial and economic data available through easy-to-use APIs.
Listen to this two-part interview with him.
Note: The interview is audio-only; closed captioning is available (CC button in the player).
标签:buying map slides dac trade rod pos iat mbr
原文地址:https://www.cnblogs.com/ecoflex/p/10977470.html