[MPC] 4. Optimal Control(2) - Taylor Series ์ ์šฉ, Algebraic Riccati Equation(ARE) ๊ตฌํ•˜๊ธฐ

2024. 3. 8. 10:03ยท๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
728x90

LQR์— ์ ์šฉ

$$V^{*}(x(t), t) = \underset{u[t, t+\Delta t]}{min} \{
\Delta t \cdot l[x(t + \alpha \Delta t), u(t + \alpha \Delta t), t + \alpha \Delta t] + V^{*}(x(t + \Delta t), t+\Delta t)
\}$$

  • ์ด ์‹์—์„œ $V^{*}(x(t + \Delta t), t+\Delta t)$ ๋ถ€๋ถ„์„ ์œ„ Taylor Series๋กœ x์™€ t์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌํ•ด๋ณด์ž.
  • $x = (x(t), t), v = \Delta t$ ๋ผ๊ณ  ์ƒ๊ฐํ•˜์ž.
  • ์ •๋ฆฌํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
    $$V^{*}(x + v) = V^{*}(x) + f'(x)v + f(x)v' + \frac 12 f''(x)v^{2}+ \frac12 f(x)\cdots$$
  • ์—ฌ๊ธฐ์— $x = (x(t), t), v = \Delta t$ ๋ฅผ ์ ์šฉํ•˜๋ฉด
    $$V^{*}(x(t + \Delta t), t+\Delta t)
    = V^{*}(x(t), t)[\frac{\partial V^{*}}{\partial x}]^{T} \cdot (x(t), t) \cdot \Delta t \cdot \frac{dx(t)}{dt} \frac{\partial V^{*}}{\partial t} \cdot (x(t), t) \cdot \Delta t \cdot 1 + H.O.T$$
    • ํŽธ๋ฏธ๋ถ„ ํ•  ๋•Œ $x(t)$๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋ฉด $[\frac{\partial V^{*}}{\partial x}]^{T}$ ๋Š” ๊ฑฐ๋ฆฌ($x$) ๊ธฐ์ค€์ด๋ฏ€๋กœ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๋˜๊ณ , ๋•Œ๋ฌธ์— gradient ์—ฐ์‚ฐ์„ ์œ„ํ•ด์„œ ์ „์น˜ํ–‰๋ ฌ๋กœ ์ฒ˜๋ฆฌํ•ด์ฃผ์—ˆ๋‹ค.
    • $x(t)$ ์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์œผ๋กœ $\frac{dV^{*}}{dt}$ ๋ฅผ ๊ณฑํ•ด์ฃผ์—ˆ๋‹ค.
    • ํŽธ๋ฏธ๋ถ„ ํ•  ๋•Œ $\Delta t$๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋ฉด $\frac{\partial V^{*}}{\partial t}$ ๋Š” ์‹œ๊ฐ„($t$) ๊ธฐ์ค€์ด๋ฏ€๋กœ ๊ทธ๋Œ€๋กœ ๊ณ„์‚ฐํ•œ๋‹ค.
    • $t$๋Š” ์ƒ์ˆ˜์ด๋ฏ€๋กœ ์ด์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์œผ๋กœ $1$ ์„ ๊ณฑํ•ด์ฃผ์—ˆ๋‹ค.
  • ์ด๋‹ค. $H.O.T$ ๋Š” ๊ณ ์ฐจํ•ญ์ด๋ผ๊ณ  ๋ณด๋ฉด ๋œ๋‹ค. ์ž‘์•„์„œ ๋’ค์—์„œ ๋‚ ๋ฆฐ๋‹ค.
  • ์ด ์‹์€ ์•„๋ž˜์™€ ๊ฐ™๊ฒŒ ๋œ๋‹ค.
    $$\begin{matrix}V^{*}(x(t), t) &=& \underset{u[t, t+\Delta t]}{min} \{ 
    \Delta t \cdot l[x(t +  \alpha \Delta t), u(t + \alpha \Delta t), t + \alpha \Delta t] \\ 
    && + V^{*}(x(t), t)  \\
    + & &[\frac{\partial V^{*}}{\partial x}]^{T} \cdot (x(t), t) \cdot \Delta t \cdot \frac{dx(t)}{dt}\\ 
    + & &\frac{\partial V^{*}}{\partial t} \cdot (x(t), t) \cdot \Delta t + H.O.T
     \} \end{matrix}$$
  • $H.O.T$๋Š” ์ž‘์€ ๊ฐ’์ด๋ฏ€๋กœ ๋ฌด์‹œํ•˜๊ณ , ์–‘ ๋ณ€์—์„œ $V^{*}(x(t), t)$๋ฅผ ์ œ๊ฑฐ, $\Delta t$๋กœ ๋‚˜๋ˆ ์ฃผ๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

$$\frac{\partial V^{*}}{\partial t}(x(t),t)=−\underset{u(t)}{min}{l[x(t),u(t),t]+[\frac{\partial V^{*}}{\partial x}]^{T}f(x,u,t)}$$

์„ ํ˜• ์‹œ์Šคํ…œ ์ ์šฉ

  • system : $\dot x = Ax + Bu, x(t_{0}) = x_{0}$
  • cost function : $$V(x(t_{0}), y, t_{0}) = \int_{t_{0}}^{T_{f}}(u^{T}Ru + x^{T}Qx)dt + x^{T}(T_{f})Qx(T_{f})$$
  • cost function์„ ์ตœ์ ํ™”ํ•˜๋Š” $u^{*}(t), t_{0} \le t \le T_{f}$ ๋ฅผ ์ฐพ์•„๋ณด์ž. $t_{0}$์—์„œ cost function์˜ ์ตœ์  ํ•จ์ˆ˜๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
    $$\begin{matrix} V^{\*}(x(t_{0}), y, t_{0}) = x^{T}(t)Px(t), & P =P^{T} \end{matrix}$$
  • ์›๋ณธ cost function์„ hamilton-jacobi equation์— ์ ์šฉํ•ด๋ณด๋ฉด $l, f, V^{*}$๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
    $$\begin{matrix} l=u^{T}Ru + x^{T}Qx & f=Ax+Bu & V^{*} = x^{T}Px \end{matrix}$$
  • ์ด๊ฒƒ์„ ์ ์šฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€ ์‹์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
    $$\frac{\partial V^{*}}{\partial t} = 0 = -\underset{u(t)}{min} [u^{T}Ru+x^{T}Qx + 2x^{T} P(Ax + Bu)]$$
  • ์ด ๋‚ด์šฉ์„ quadratic form์œผ๋กœ ์ •๋ฆฌํ•œ๋‹ค.
    $$u^{T}Ru+x^{T}Qx+2x^{T}PBu + 2x^{T}PAx=0$$
  • ๋ญ์•ผ์ด๊ฒŒ
    ์ž˜ ์ •๋ฆฌํ•˜๋ฉด
    $$0=-[(u+R^{-1}B^{T}Px)R(u+R^{-1}B^{T}Px) + x^{T}(Q + PA + A^{T}P - PBR^{-1}B^{T}P)x]$$
  • ๋”ฐ๋ผ์„œ ์œ„ ์‹์„ ์ตœ์†Œํ™”ํ•˜๋Š” $u^{*}$๋Š”
    $$u^{*}= -R^{-1}B^{T}Px$$
  • ์ด๊ณ , ์•„๋ž˜ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” P
    $$Q + PA + A^{T}P - PBR^{-1}B^{T}P = 0$$
  • ์„ ๊ตฌํ•˜๋ฉด ์ด ์‹์ด Algebraic Riccati Equation(ARE) ๋‹ค.
    • $Q, R$์€ Design factor์ด๋ฉฐ $P$๋Š” ์ด ๊ฐ’์— ๋”ฐ๋ผ ์–ป์–ด์ง€๋Š” ๊ฐ’
728x90
์ €์ž‘์žํ‘œ์‹œ ๋น„์˜๋ฆฌ ๋ณ€๊ฒฝ๊ธˆ์ง€ (์ƒˆ์ฐฝ์—ด๋ฆผ)

'๐Ÿฌ ML & Data > ๐Ÿ“ฎ Reinforcement Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[๊ฐ•ํ™”ํ•™์Šต] TRPO(Trust Region Policy Optimization) ๋…ผ๋ฌธ ์ •๋ฆฌ  (8) 2024.09.02
[MPC] 4. Optimal Control(1) - LQR๊ณผ Taylor Series(ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜)  (1) 2024.03.06
[MPC] 3. ์ƒํƒœ(state)์™€ ์ถœ๋ ฅ(output) ์˜ˆ์ธกํ•ด๋ณด๊ธฐ  (0) 2024.03.06
[MPC] 2. ์ƒํƒœ ๊ณต๊ฐ„ ๋ฐฉ์ •์‹ ์œ ๋„  (0) 2024.03.06
[MPC] 1. Model Predictive Control Intro  (0) 2024.03.06
'๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [๊ฐ•ํ™”ํ•™์Šต] TRPO(Trust Region Policy Optimization) ๋…ผ๋ฌธ ์ •๋ฆฌ
  • [MPC] 4. Optimal Control(1) - LQR๊ณผ Taylor Series(ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜)
  • [MPC] 3. ์ƒํƒœ(state)์™€ ์ถœ๋ ฅ(output) ์˜ˆ์ธกํ•ด๋ณด๊ธฐ
  • [MPC] 2. ์ƒํƒœ ๊ณต๊ฐ„ ๋ฐฉ์ •์‹ ์œ ๋„
darly213
darly213
ํ˜ธ๋ฝํ˜ธ๋ฝํ•˜์ง€ ์•Š์€ ๊ฐœ๋ฐœ์ž๊ฐ€ ๋˜์–ด๋ณด์ž
  • darly213
    ERROR DENY
    darly213
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (97)
      • ๐Ÿฌ ML & Data (50)
        • ๐ŸŒŠ Computer Vision (2)
        • ๐Ÿ“ฎ Reinforcement Learning (12)
        • ๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ (8)
        • ๐Ÿฆ„ ๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹ (3)
        • โ” Q & etc. (5)
        • ๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹ (20)
      • ๐Ÿฅ Web (21)
        • โšก Back-end | FastAPI (2)
        • โ›… Back-end | Spring (5)
        • โ” Back-end | etc. (9)
        • ๐ŸŽจ Front-end (4)
      • ๐ŸŽผ Project (8)
        • ๐ŸงŠ Monitoring System (8)
      • ๐Ÿˆ Algorithm (0)
      • ๐Ÿ”ฎ CS (2)
      • ๐Ÿณ Docker & Kubernetes (3)
      • ๐ŸŒˆ DEEEEEBUG (2)
      • ๐ŸŒ  etc. (8)
      • ๐Ÿ˜ผ ์‚ฌ๋‹ด (1)
  • ๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

    • ํ™ˆ
    • ๋ฐฉ๋ช…๋ก
    • GitHub
    • Notion
    • LinkedIn
  • ๋งํฌ

    • Github
    • Notion
  • ๊ณต์ง€์‚ฌํ•ญ

    • Contact ME!
  • 250x250
  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
darly213
[MPC] 4. Optimal Control(2) - Taylor Series ์ ์šฉ, Algebraic Riccati Equation(ARE) ๊ตฌํ•˜๊ธฐ
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”