The dataset used for the cart-pole problem is a finite set of states: S, a finite set of actions: A, a state transition probability matrix, P, a reward function R, and a discount factor γ.
BibTex:
Before browse our site, please accept our cookies policy