[MPC] 4. Optimal Control(1) - LQR๊ณผ Taylor Series(ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜)

2024. 3. 6. 16:27ยท๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
728x90
  • optimal control ๊ธฐ์ดˆ - LQR(Linear Quadratic Regulator)
  • LQR์ด ๊ธฐ์ดˆ๋ผ์„œ ์š”๊ฑธ๋กœ

 

  • system : $\dot x = f(x, u, t), x(t_{0}) = x_{0}$
  • cost function :
    $$V(x(t_{0}), u, t_{0}) = \int_{t_{0}}^{T} l[x(\tau), u(\tau), \tau]d\tau + m(x(T))$$

 

  • ์œ„ cost function์„ ์ตœ์†Œํ™”ํ•˜๋Š” ์ž…๋ ฅ $u^{*}(t), t_{0}\le t \le T$ ์ฐพ๊ธฐ -> optimal control์˜ ๋ชฉ์ 
  • principle of optimality ์— ๋”ฐ๋ผ ํ•œ ํ•ด๊ฐ€ ์ตœ์ ์ด๋ฉด sub problem์˜ ํ•ด๋„ ์ตœ์ ์ด ๋œ๋‹ค.

 

$t_{0} < t < t_{1} < T$ ๋กœ $t_{1}$ ์ถ”๊ฐ€ํ•ด์„œ ์œ ๋„ํ•˜๊ธฐ

 

$$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\int_{t}^{t_{1}}l[x(\tau), u(\tau), \tau]d\tau + \underset{u[t_{1}, T]}{min}{\int_{t_{1}}^{T}l[x(\tau), u(\tau), \tau]d\tau + m(x(T))}}$$

  • ์ด ์ค‘์—์„œ ๋’ท ํ•ญ์„
    $$\underset{u[t_{1}, T]}{min}{\int_{t_{1}}^{T}l[x(\tau), u(\tau), \tau]d\tau + m(x(T))} = V^{*}(x(t_{1}), t_{1})$$

 

  • ์œผ๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์ตœ์ข… ์‹
    $$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\int_{t}^{t_{1}}l[x(\tau), u(\tau), \tau]d\tau + V^{*}(x(t_{1}), t_{1})}$$
  • ์ด ๋•Œ, $t_{1} = t + \Delta t$ ์œผ๋กœ ์„ ์–ธ. $\Delta t$ ๋Š” ์ž‘์€ ๊ฐ’์ด๊ณ , $t_{1}$์€ $t$ ์—์„œ ์กฐ๊ธˆ! ์‹œ๊ฐ„์ด ํ๋ฅธ ์‹œ์ 

 

  • $t_{1}$์„ ๋Œ€์ฒดํ•œ ์‹
    $$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\int_{t}^{t + \Delta t}l[x(\tau), u(\tau), \tau]d\tau + V^{*}(x(t + \Delta t), t + \Delta t)}$$

 

  • ์ ๋ถ„์‹์„ ํ•จ์ˆ˜์˜...์–ด๋–ค ๋„“์ด๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์Œ. ๋”ฐ๋ผ์„œ ๋„ˆ๋น„ * ๋†’์ด๋กœ ๊ตฌํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋„ˆ๋น„๋Š” $\Delta t$ ์ด๊ณ , ๋†’์ด๋Š” $t$ ~ $t+\Delta t$ ์‚ฌ์ด์˜ ์–ด๋А ์œ„์น˜์—์„œ์˜ ํ•จ์ˆ˜๊ฐ’
    • ์ด ์œ„์น˜๋ฅผ $t + \alpha \Delta t$ ๋ผ๊ณ  ํ•˜๋ฉด ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Œ
      $$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\Delta t \cdot l[x(t+\alpha \Delta t), u(t+\alpha \Delta t), t+\alpha \Delta t] + V^{*}(x(t + \Delta t), t + \Delta t)}$$
  • ์ด์ œ ๋’ท ํ•ญ($V^{*}(x(t + \Delta t), t + \Delta t)$)์„ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋กœ ์ •๋ฆฌ

 

Taylor series(ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜)

https://darkpgmr.tistory.com/59
https://sine-qua-none.tistory.com/28

  • $f(x) = p_{\infty}(x)$ ์—์„œ,
    $$P_{n}(x) = f(a) + f^{\prime}(a)(x-a) + \frac{f^{\prime \prime}(a)}{2!}(x-a)^{2}+ \cdots + \frac{f^{(n)}(a)}{n!}(x-a)^{n}= \sum\limits_{k=0}^{\infty} \frac{f^{(k)}(a)}{k!}(x-a)^k$$

 

  • ๋ชจ๋“  $x$์— ๋Œ€ํ•ด์„œ ์ขŒ์šฐ๋ณ€์ด ๊ฐ™์ง€๋Š” ์•Š์Œ. $x=a$์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ์ •ํ™•ํ•˜๊ณ , ๊ณ ์ฐจํ•ญ์ด ๋งŽ์„ ์ˆ˜๋ก ์ •ํ™•ํ•จ.
  • $x=a$์—์„œ $f(x)$์™€ ๊ฐ™์€ ๋ฏธ๋ถ„๊ณ„์ˆ˜๋ฅผ ๊ฐ–๋Š” ๋‹คํ•ญ์‹์œผ๋กœ ๊ทผ์‚ฌํ•˜๋Š” ๋ฐฉ์‹
  • ์ฐจ์ˆ˜๊ฐ€ ์˜ฌ๋ผ๊ฐ€๋ฉด ๋” ๊ธด ๊ตฌ๊ฐ„์—์„œ ์›๋ณธ๊ณผ ์œ ์‚ฌํ•ด์ง€๊ณ , ์ฐจ์ˆ˜๊ฐ€ ๋‚ฎ์•„์ง€๋ฉด $x=a$์— ๊ฐ€๊นŒ์šด ๊ตฌ๊ฐ„์—์„œ๋งŒ ์œ ์‚ฌํ•จ

 

๋‹ค๋ณ€์ˆ˜ํ•จ์ˆ˜

$$f(x+v) = f(x) + f^{\prime}(x)v+ \frac12 f^{\prime \prime }(x)v^{2} + \cdots$$

  • ์œ„์™€ ๊ฐ™์€ ์‹์ด ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜ ๊ธฐ๋ณธ

 

  • $f: R^{n} \to R$ ์ด๊ณ , $x = (x_{1}, x_{2}, x_{3}, \cdots)$ , $v=(v_{1}, v_{2}, v_{3}, \cdots)$ ์ด๋ผ๊ณ  ํ•˜๋ฉด
    • $f^{\prime}(x) = \bigtriangledown f(x)$ : gradient descent
    • $f^{\prime \prime}(x) = H_{f}$ : ํ—ค์„ธ(Hessian) ํ–‰๋ ฌ

 

  • $x$ ์™€ $v$๊ฐ€ ํ–‰๋ ฌ์ผ ๋•Œ, ๊ฐ ๋ณ€์ˆ˜์˜ ํŽธ๋ฏธ๋ถ„์„ ์ ์šฉํ•ด์„œ ์œ„ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜ ์‹์„ ๋‹ค์‹œ ์จ๋ณด๋ฉด
    $$f'(x)v = \nabla f(x)^{T} v$$ $$f''(x)v^{2}= v^{T}H_{f}v$$ $$\nabla f(x) = (\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}},\cdots, \frac{\partial f}{\partial x_{n}})^T$$ $$H_{f}= \begin{bmatrix} \frac{\partial^{2}f}{\partial x_{1}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{1}\partial x_{2}} & \cdots & \frac{\partial^{2}f}{\partial x_{1}\partial x_{n}} \
    \vdots & & & \vdots \
    \frac{\partial^{2}f}{\partial x_{n}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{n}\partial x_{2}} & \cdots & \frac{\partial^{2}f}{\partial x_{n}\partial x_{n}} \end{bmatrix}$$ $$f(x+v) = f(x) + \nabla f(x)^{T}v + \frac12 v^{T}H_{f}v + \cdots$$

 

2๋ณ€์ˆ˜ ํ•จ์ˆ˜

  • $x=(x_1, x_2)$, $v=(v_{1}, v_{2})$ ๋ผ๋ฉด
    $$\nabla f(x) = (\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}})$$ $$H_{f}= \begin{bmatrix} \frac{\partial^{2}f}{\partial x_{1}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{1}\partial x_{2}} \ \frac{\partial^{2}f}{\partial x_{2}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{2}\partial x_{2}} \end{bmatrix} $$

 

  • ์ด ๋„ํ•จ์ˆ˜๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ $f(x_{1}+ v_{1}, x_{2} + v_{2})$ ๊ตฌํ•ด๋ณด๋ฉด
    $$f(x_{1}+v_{1}, x_{2}+ v_{2}) = f(x_{1} , x_{2}) \frac{\partial f(x)}{\partial x_{1}}v_{1} \frac{\partial f(x)}{\partial x_{2}}v_{2} \frac12 \frac{\partial^{2} f(x)}{\partial x_{1}^{2}}v_{1}^{2} \frac{\partial^{2} f(x)}{\partial x_{1}x_{2}}v_{1}v_{2} \frac12 \frac{\partial^{2} f(x)}{\partial x_{2}^{2}}v_{2}^{2} \cdots$$

 

LQR์— ํ•ด๋‹น ๋‚ด์šฉ์„ ์ ์šฉํ•ด์„œ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋กœ ์‹์„ ๋ฐ”๊ฟ”๋ณด๊ณ , Quadratic form์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •์€ ๋‹ค์Œ ํฌ์ŠคํŠธ์—..

728x90
์ €์ž‘์žํ‘œ์‹œ ๋น„์˜๋ฆฌ ๋ณ€๊ฒฝ๊ธˆ์ง€ (์ƒˆ์ฐฝ์—ด๋ฆผ)

'๐Ÿฌ ML & Data > ๐Ÿ“ฎ Reinforcement Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[๊ฐ•ํ™”ํ•™์Šต] TRPO(Trust Region Policy Optimization) ๋…ผ๋ฌธ ์ •๋ฆฌ  (8) 2024.09.02
[MPC] 4. Optimal Control(2) - Taylor Series ์ ์šฉ, Algebraic Riccati Equation(ARE) ๊ตฌํ•˜๊ธฐ  (1) 2024.03.08
[MPC] 3. ์ƒํƒœ(state)์™€ ์ถœ๋ ฅ(output) ์˜ˆ์ธกํ•ด๋ณด๊ธฐ  (0) 2024.03.06
[MPC] 2. ์ƒํƒœ ๊ณต๊ฐ„ ๋ฐฉ์ •์‹ ์œ ๋„  (0) 2024.03.06
[MPC] 1. Model Predictive Control Intro  (0) 2024.03.06
'๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [๊ฐ•ํ™”ํ•™์Šต] TRPO(Trust Region Policy Optimization) ๋…ผ๋ฌธ ์ •๋ฆฌ
  • [MPC] 4. Optimal Control(2) - Taylor Series ์ ์šฉ, Algebraic Riccati Equation(ARE) ๊ตฌํ•˜๊ธฐ
  • [MPC] 3. ์ƒํƒœ(state)์™€ ์ถœ๋ ฅ(output) ์˜ˆ์ธกํ•ด๋ณด๊ธฐ
  • [MPC] 2. ์ƒํƒœ ๊ณต๊ฐ„ ๋ฐฉ์ •์‹ ์œ ๋„
darly213
darly213
ํ˜ธ๋ฝํ˜ธ๋ฝํ•˜์ง€ ์•Š์€ ๊ฐœ๋ฐœ์ž๊ฐ€ ๋˜์–ด๋ณด์ž
  • darly213
    ERROR DENY
    darly213
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (97)
      • ๐Ÿฌ ML & Data (50)
        • ๐ŸŒŠ Computer Vision (2)
        • ๐Ÿ“ฎ Reinforcement Learning (12)
        • ๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ (8)
        • ๐Ÿฆ„ ๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹ (3)
        • โ” Q & etc. (5)
        • ๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹ (20)
      • ๐Ÿฅ Web (21)
        • โšก Back-end | FastAPI (2)
        • โ›… Back-end | Spring (5)
        • โ” Back-end | etc. (9)
        • ๐ŸŽจ Front-end (4)
      • ๐ŸŽผ Project (8)
        • ๐ŸงŠ Monitoring System (8)
      • ๐Ÿˆ Algorithm (0)
      • ๐Ÿ”ฎ CS (2)
      • ๐Ÿณ Docker & Kubernetes (3)
      • ๐ŸŒˆ DEEEEEBUG (2)
      • ๐ŸŒ  etc. (8)
      • ๐Ÿ˜ผ ์‚ฌ๋‹ด (1)
  • ๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

    • ํ™ˆ
    • ๋ฐฉ๋ช…๋ก
    • GitHub
    • Notion
    • LinkedIn
  • ๋งํฌ

    • Github
    • Notion
  • ๊ณต์ง€์‚ฌํ•ญ

    • Contact ME!
  • 250x250
  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
darly213
[MPC] 4. Optimal Control(1) - LQR๊ณผ Taylor Series(ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜)
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”