Q-Learning Example
本文发布于 358 天前。

Basic rules

image-20240227144939325

Example 1

image-20240219185743823

First at the goal state $g$ we have:

$Q(g,g)=1+\frac{1}{2}Q(g,g)$

Solve the equation: $Q(g,g)=2$

$Q(s,g)=-4+\frac{1}{2}Q(g,g)=-4+1=-3$

$Q(s,a)=-1+\frac{1}{2}Q(a,b)$

$Q(a,b)=-1+\frac{1}{2}Q(b,s)$

$Q(b,s)=-1+\frac{1}{2}\max{Q(s,a),Q(s,g)}=-1+\frac{1}{2}\max{Q(s,a),-3}$

Case 1: Assume $Q(s,a)\geq -3$.

  • $Q(b,s)=-1+\frac{1}{2}\max{Q(s,a)\geq -3,-3}=-1+\frac{1}{2}Q(s,a)$
  • $Q(s,a)=-1+\frac{1}{2}Q(a,b)$
  • $Q(a,b)=-1+\frac{1}{2}Q(b,s)$

Solving the equations and we got $Q(s,a)=Q(a,b)=Q(b,s)=-2$.

$Q(a,b) = -1 + 0.5(Q(b,s)) = -1 + 0.5(-1+0.5Q(s,a))$

$\rightarrow Q(a,b)=-1-0.5+0.25Q(s,a)$

$\rightarrow Q(a,b)=-1.5 + 0.25Q(s,a)$

$Q(s,a)=-1+\frac{1}{2}Q(a,b)=-1+0.5(-1.5+0.25Q(s,a))$

$\rightarrow Q(s,a)=-1-0.75+0.125Q(s,a)$

$\rightarrow 0.875Q(s,a) = -1.75$

$\rightarrow Q(s,a)=-2$

Case 2: Assume $Q(s,a)<-3$

  • $Q(b,s)=-1+\frac{1}{2}(-3)=-2.5$
  • $Q(a,b)=-1+\frac{1}{2}(-2.5)=-2.25$
  • $Q(s,a)=-1+\frac{1}{2}(-2.25)=-2.125>-3$

Contradicts with the assumption.

Example 2

image-20240227131042467

First at goal stage $g$ we have:

$Q(g,g)=1+\frac{1}{2}Q(g,g)$

Solve the equation $Q(g,g)=2$

$Q(s,g)=-6+\frac{1}{2}(2)=-5$

$Q(a,g)=-5+\frac{1}{2}(2)=-4$

$Q(s,a)=-2+\frac{1}{2}(-4)=-4$

标题:Q-Learning Example
作者:IKK
除转载和特殊声明外,所有文章采用 CC BY-NC-SA 4.0协议
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇