Basic rules
Example 1
First at the goal state $g$ we have:
$Q(g,g)=1+\frac{1}{2}Q(g,g)$
Solve the equation: $Q(g,g)=2$
$Q(s,g)=-4+\frac{1}{2}Q(g,g)=-4+1=-3$
$Q(s,a)=-1+\frac{1}{2}Q(a,b)$
$Q(a,b)=-1+\frac{1}{2}Q(b,s)$
$Q(b,s)=-1+\frac{1}{2}\max{Q(s,a),Q(s,g)}=-1+\frac{1}{2}\max{Q(s,a),-3}$
Case 1: Assume $Q(s,a)\geq -3$.
- $Q(b,s)=-1+\frac{1}{2}\max{Q(s,a)\geq -3,-3}=-1+\frac{1}{2}Q(s,a)$
- $Q(s,a)=-1+\frac{1}{2}Q(a,b)$
- $Q(a,b)=-1+\frac{1}{2}Q(b,s)$
Solving the equations and we got $Q(s,a)=Q(a,b)=Q(b,s)=-2$.
$Q(a,b) = -1 + 0.5(Q(b,s)) = -1 + 0.5(-1+0.5Q(s,a))$
$\rightarrow Q(a,b)=-1-0.5+0.25Q(s,a)$
$\rightarrow Q(a,b)=-1.5 + 0.25Q(s,a)$
$Q(s,a)=-1+\frac{1}{2}Q(a,b)=-1+0.5(-1.5+0.25Q(s,a))$
$\rightarrow Q(s,a)=-1-0.75+0.125Q(s,a)$
$\rightarrow 0.875Q(s,a) = -1.75$
$\rightarrow Q(s,a)=-2$
Case 2: Assume $Q(s,a)<-3$
- $Q(b,s)=-1+\frac{1}{2}(-3)=-2.5$
- $Q(a,b)=-1+\frac{1}{2}(-2.5)=-2.25$
- $Q(s,a)=-1+\frac{1}{2}(-2.25)=-2.125>-3$
Contradicts with the assumption.
Example 2
First at goal stage $g$ we have:
$Q(g,g)=1+\frac{1}{2}Q(g,g)$
Solve the equation $Q(g,g)=2$
$Q(s,g)=-6+\frac{1}{2}(2)=-5$
$Q(a,g)=-5+\frac{1}{2}(2)=-4$
$Q(s,a)=-2+\frac{1}{2}(-4)=-4$