๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning

[MPC] 4. Optimal Control(1) - LQR๊ณผ Taylor Series(ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜)

darly213 2024. 3. 6. 16:27
728x90
  • optimal control ๊ธฐ์ดˆ - LQR(Linear Quadratic Regulator)
  • LQR์ด ๊ธฐ์ดˆ๋ผ์„œ ์š”๊ฑธ๋กœ

 

  • system : $\dot x = f(x, u, t), x(t_{0}) = x_{0}$
  • cost function :
    $$V(x(t_{0}), u, t_{0}) = \int_{t_{0}}^{T} l[x(\tau), u(\tau), \tau]d\tau + m(x(T))$$

 

  • ์œ„ cost function์„ ์ตœ์†Œํ™”ํ•˜๋Š” ์ž…๋ ฅ $u^{*}(t), t_{0}\le t \le T$ ์ฐพ๊ธฐ -> optimal control์˜ ๋ชฉ์ 
  • principle of optimality ์— ๋”ฐ๋ผ ํ•œ ํ•ด๊ฐ€ ์ตœ์ ์ด๋ฉด sub problem์˜ ํ•ด๋„ ์ตœ์ ์ด ๋œ๋‹ค.

 

$t_{0} < t < t_{1} < T$ ๋กœ $t_{1}$ ์ถ”๊ฐ€ํ•ด์„œ ์œ ๋„ํ•˜๊ธฐ

 

$$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\int_{t}^{t_{1}}l[x(\tau), u(\tau), \tau]d\tau + \underset{u[t_{1}, T]}{min}{\int_{t_{1}}^{T}l[x(\tau), u(\tau), \tau]d\tau + m(x(T))}}$$

  • ์ด ์ค‘์—์„œ ๋’ท ํ•ญ์„
    $$\underset{u[t_{1}, T]}{min}{\int_{t_{1}}^{T}l[x(\tau), u(\tau), \tau]d\tau + m(x(T))} = V^{*}(x(t_{1}), t_{1})$$

 

  • ์œผ๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์ตœ์ข… ์‹
    $$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\int_{t}^{t_{1}}l[x(\tau), u(\tau), \tau]d\tau + V^{*}(x(t_{1}), t_{1})}$$
  • ์ด ๋•Œ, $t_{1} = t + \Delta t$ ์œผ๋กœ ์„ ์–ธ. $\Delta t$ ๋Š” ์ž‘์€ ๊ฐ’์ด๊ณ , $t_{1}$์€ $t$ ์—์„œ ์กฐ๊ธˆ! ์‹œ๊ฐ„์ด ํ๋ฅธ ์‹œ์ 

 

  • $t_{1}$์„ ๋Œ€์ฒดํ•œ ์‹
    $$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\int_{t}^{t + \Delta t}l[x(\tau), u(\tau), \tau]d\tau + V^{*}(x(t + \Delta t), t + \Delta t)}$$

 

  • ์ ๋ถ„์‹์„ ํ•จ์ˆ˜์˜...์–ด๋–ค ๋„“์ด๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์Œ. ๋”ฐ๋ผ์„œ ๋„ˆ๋น„ * ๋†’์ด๋กœ ๊ตฌํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋„ˆ๋น„๋Š” $\Delta t$ ์ด๊ณ , ๋†’์ด๋Š” $t$ ~ $t+\Delta t$ ์‚ฌ์ด์˜ ์–ด๋А ์œ„์น˜์—์„œ์˜ ํ•จ์ˆ˜๊ฐ’
    • ์ด ์œ„์น˜๋ฅผ $t + \alpha \Delta t$ ๋ผ๊ณ  ํ•˜๋ฉด ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Œ
      $$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\Delta t \cdot l[x(t+\alpha \Delta t), u(t+\alpha \Delta t), t+\alpha \Delta t] + V^{*}(x(t + \Delta t), t + \Delta t)}$$
  • ์ด์ œ ๋’ท ํ•ญ($V^{*}(x(t + \Delta t), t + \Delta t)$)์„ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋กœ ์ •๋ฆฌ

 

Taylor series(ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜)

https://darkpgmr.tistory.com/59
https://sine-qua-none.tistory.com/28

  • $f(x) = p_{\infty}(x)$ ์—์„œ,
    $$P_{n}(x) = f(a) + f^{\prime}(a)(x-a) + \frac{f^{\prime \prime}(a)}{2!}(x-a)^{2}+ \cdots + \frac{f^{(n)}(a)}{n!}(x-a)^{n}= \sum\limits_{k=0}^{\infty} \frac{f^{(k)}(a)}{k!}(x-a)^k$$

 

  • ๋ชจ๋“  $x$์— ๋Œ€ํ•ด์„œ ์ขŒ์šฐ๋ณ€์ด ๊ฐ™์ง€๋Š” ์•Š์Œ. $x=a$์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ์ •ํ™•ํ•˜๊ณ , ๊ณ ์ฐจํ•ญ์ด ๋งŽ์„ ์ˆ˜๋ก ์ •ํ™•ํ•จ.
  • $x=a$์—์„œ $f(x)$์™€ ๊ฐ™์€ ๋ฏธ๋ถ„๊ณ„์ˆ˜๋ฅผ ๊ฐ–๋Š” ๋‹คํ•ญ์‹์œผ๋กœ ๊ทผ์‚ฌํ•˜๋Š” ๋ฐฉ์‹
  • ์ฐจ์ˆ˜๊ฐ€ ์˜ฌ๋ผ๊ฐ€๋ฉด ๋” ๊ธด ๊ตฌ๊ฐ„์—์„œ ์›๋ณธ๊ณผ ์œ ์‚ฌํ•ด์ง€๊ณ , ์ฐจ์ˆ˜๊ฐ€ ๋‚ฎ์•„์ง€๋ฉด $x=a$์— ๊ฐ€๊นŒ์šด ๊ตฌ๊ฐ„์—์„œ๋งŒ ์œ ์‚ฌํ•จ

 

๋‹ค๋ณ€์ˆ˜ํ•จ์ˆ˜

$$f(x+v) = f(x) + f^{\prime}(x)v+ \frac12 f^{\prime \prime }(x)v^{2} + \cdots$$

  • ์œ„์™€ ๊ฐ™์€ ์‹์ด ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜ ๊ธฐ๋ณธ

 

  • $f: R^{n} \to R$ ์ด๊ณ , $x = (x_{1}, x_{2}, x_{3}, \cdots)$ , $v=(v_{1}, v_{2}, v_{3}, \cdots)$ ์ด๋ผ๊ณ  ํ•˜๋ฉด
    • $f^{\prime}(x) = \bigtriangledown f(x)$ : gradient descent
    • $f^{\prime \prime}(x) = H_{f}$ : ํ—ค์„ธ(Hessian) ํ–‰๋ ฌ

 

  • $x$ ์™€ $v$๊ฐ€ ํ–‰๋ ฌ์ผ ๋•Œ, ๊ฐ ๋ณ€์ˆ˜์˜ ํŽธ๋ฏธ๋ถ„์„ ์ ์šฉํ•ด์„œ ์œ„ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜ ์‹์„ ๋‹ค์‹œ ์จ๋ณด๋ฉด
    $$f'(x)v = \nabla f(x)^{T} v$$ $$f''(x)v^{2}= v^{T}H_{f}v$$ $$\nabla f(x) = (\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}},\cdots, \frac{\partial f}{\partial x_{n}})^T$$ $$H_{f}= \begin{bmatrix} \frac{\partial^{2}f}{\partial x_{1}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{1}\partial x_{2}} & \cdots & \frac{\partial^{2}f}{\partial x_{1}\partial x_{n}} \
    \vdots & & & \vdots \
    \frac{\partial^{2}f}{\partial x_{n}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{n}\partial x_{2}} & \cdots & \frac{\partial^{2}f}{\partial x_{n}\partial x_{n}} \end{bmatrix}$$ $$f(x+v) = f(x) + \nabla f(x)^{T}v + \frac12 v^{T}H_{f}v + \cdots$$

 

2๋ณ€์ˆ˜ ํ•จ์ˆ˜

  • $x=(x_1, x_2)$, $v=(v_{1}, v_{2})$ ๋ผ๋ฉด
    $$\nabla f(x) = (\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}})$$ $$H_{f}= \begin{bmatrix} \frac{\partial^{2}f}{\partial x_{1}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{1}\partial x_{2}} \ \frac{\partial^{2}f}{\partial x_{2}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{2}\partial x_{2}} \end{bmatrix} $$

 

  • ์ด ๋„ํ•จ์ˆ˜๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ $f(x_{1}+ v_{1}, x_{2} + v_{2})$ ๊ตฌํ•ด๋ณด๋ฉด
    $$f(x_{1}+v_{1}, x_{2}+ v_{2}) = f(x_{1} , x_{2}) \frac{\partial f(x)}{\partial x_{1}}v_{1} \frac{\partial f(x)}{\partial x_{2}}v_{2} \frac12 \frac{\partial^{2} f(x)}{\partial x_{1}^{2}}v_{1}^{2} \frac{\partial^{2} f(x)}{\partial x_{1}x_{2}}v_{1}v_{2} \frac12 \frac{\partial^{2} f(x)}{\partial x_{2}^{2}}v_{2}^{2} \cdots$$

 

LQR์— ํ•ด๋‹น ๋‚ด์šฉ์„ ์ ์šฉํ•ด์„œ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋กœ ์‹์„ ๋ฐ”๊ฟ”๋ณด๊ณ , Quadratic form์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •์€ ๋‹ค์Œ ํฌ์ŠคํŠธ์—..

728x90