**Example text**

Other transition probabilities play no role. 125 (see Fig. 6). 125 and simultaneously to the minimal loss 1 C(X1 ) = 1 C(4) = 10, when compared with 1 C(3) = 20. 125 = d, meaning that the control strategy mentioned is not admissible. 125. Therefore, in state 2 the decision maker should take into account not only the future dynamics, but also other trajectories (X0 = X1 = 1) that have already no chance of being realized; this means that the Bellman principle does not hold. August 15, 2012 9:16 16 P809: Examples in Markov Decision Process Examples in Markov Decision Processes Fig.

9 shows that the latter statement can be false. 21). Since v0 (x) = v1 (x) = v2 (x) = −∞, August 15, 2012 9:16 P809: Examples in Markov Decision Process 33 Finite-Horizon Models we have Y2ϕ = X1 + v2 (X2 ) = −∞. At the same time, v3 (x) ≡ 0 and Y3ϕ = X1 + A3 = X1− , so that E[Y3ϕ |F2 ] = X1− = Y2ϕ . Fig. 11: the estimating process is not a martingale. 9 presented in Fig. 13 with A = {−1, −2}, p1 (y|x, a) = |y|62 π2 , we still see that the optimal selector ϕ3 (x1 ) ≡ −1 providing v ϕ = −∞ leads to a process Ytϕ which is not a martingale: v3 (x) = 0, v2 (x) = −2, v1 (x) = x − 2, v0 (x) = −∞; E[Y3ϕ |F2 ] = X1 − 1 = Y2ϕ = X1 − 2.

7) and be optimal and uniformly optimal. Consider the Markov control strategy π ∗ with π3∗ (0|x2 ) = 0, π3∗ (a|x2 ) = for a < 0. 7) hold because 6 |a|2 π 2 ∞ i=1 (−i) × 6 = −∞ = v2 (x), i2 π 2 0+ ∞ |y|=1 x + v2 (0) = −∞ = v1 (x), 3 · “ − ∞” = −∞ = v0 (x). |y|2 π 2 m On the other hand, for any Markov strategy π m , v π = +∞. Indeed, let a ˆ = max{j : π3m (j|0) > 0}; 0 ≥ a ˆ > −∞, and consider random variable + W = (X1 + A3 )+ . It takes values 1, 2, 3, . . with probabilities not smaller than 3π3m (ˆ a|0) p1 (−ˆ a + 1|0, a)π3m (ˆ a|0) = , |−a ˆ + 1|2 π 2 p1 (−ˆ a + 2|0, a)π3m (ˆ a|0) = 3π3m (ˆ a|0) , |−a ˆ + 2|2 π 2 p1 (−ˆ a + 3|0, a)π3m (ˆ a|0) = 3π3m (ˆ a|0) , |−a ˆ + 3|2 π 2 ...