728x90
- optimal control ๊ธฐ์ด - LQR(Linear Quadratic Regulator)
- LQR์ด ๊ธฐ์ด๋ผ์ ์๊ฑธ๋ก
- system : $\dot x = f(x, u, t), x(t_{0}) = x_{0}$
- cost function :
$$V(x(t_{0}), u, t_{0}) = \int_{t_{0}}^{T} l[x(\tau), u(\tau), \tau]d\tau + m(x(T))$$
- ์ cost function์ ์ต์ํํ๋ ์ ๋ ฅ $u^{*}(t), t_{0}\le t \le T$ ์ฐพ๊ธฐ -> optimal control์ ๋ชฉ์
- principle of optimality ์ ๋ฐ๋ผ ํ ํด๊ฐ ์ต์ ์ด๋ฉด sub problem์ ํด๋ ์ต์ ์ด ๋๋ค.
$t_{0} < t < t_{1} < T$ ๋ก $t_{1}$ ์ถ๊ฐํด์ ์ ๋ํ๊ธฐ
$$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\int_{t}^{t_{1}}l[x(\tau), u(\tau), \tau]d\tau + \underset{u[t_{1}, T]}{min}{\int_{t_{1}}^{T}l[x(\tau), u(\tau), \tau]d\tau + m(x(T))}}$$
- ์ด ์ค์์ ๋ท ํญ์
$$\underset{u[t_{1}, T]}{min}{\int_{t_{1}}^{T}l[x(\tau), u(\tau), \tau]d\tau + m(x(T))} = V^{*}(x(t_{1}), t_{1})$$
- ์ผ๋ก ์ ๋ฆฌํ ์ ์๋ค. ๊ทธ๋ฌ๋ฏ๋ก ์ต์ข
์
$$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\int_{t}^{t_{1}}l[x(\tau), u(\tau), \tau]d\tau + V^{*}(x(t_{1}), t_{1})}$$ - ์ด ๋, $t_{1} = t + \Delta t$ ์ผ๋ก ์ ์ธ. $\Delta t$ ๋ ์์ ๊ฐ์ด๊ณ , $t_{1}$์ $t$ ์์ ์กฐ๊ธ! ์๊ฐ์ด ํ๋ฅธ ์์
- $t_{1}$์ ๋์ฒดํ ์
$$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\int_{t}^{t + \Delta t}l[x(\tau), u(\tau), \tau]d\tau + V^{*}(x(t + \Delta t), t + \Delta t)}$$
- ์ ๋ถ์์ ํจ์์...์ด๋ค ๋์ด๋ฅผ ๊ตฌํ๋ ๊ฒ๊ณผ ๊ฐ์. ๋ฐ๋ผ์ ๋๋น * ๋์ด๋ก ๊ตฌํ ์ ์๋๋ฐ, ๋๋น๋ $\Delta t$ ์ด๊ณ , ๋์ด๋ $t$ ~ $t+\Delta t$ ์ฌ์ด์ ์ด๋ ์์น์์์ ํจ์๊ฐ
- ์ด ์์น๋ฅผ $t + \alpha \Delta t$ ๋ผ๊ณ ํ๋ฉด ์์ ์๋์ ๊ฐ์
$$V^{*}(x(t), t) = \underset{u[t, T]}{min}{\Delta t \cdot l[x(t+\alpha \Delta t), u(t+\alpha \Delta t), t+\alpha \Delta t] + V^{*}(x(t + \Delta t), t + \Delta t)}$$
- ์ด ์์น๋ฅผ $t + \alpha \Delta t$ ๋ผ๊ณ ํ๋ฉด ์์ ์๋์ ๊ฐ์
- ์ด์ ๋ท ํญ($V^{*}(x(t + \Delta t), t + \Delta t)$)์ ํ ์ผ๋ฌ ๊ธ์๋ก ์ ๋ฆฌ
Taylor series(ํ ์ผ๋ฌ ๊ธ์)
https://darkpgmr.tistory.com/59
https://sine-qua-none.tistory.com/28
- $f(x) = p_{\infty}(x)$ ์์,
$$P_{n}(x) = f(a) + f^{\prime}(a)(x-a) + \frac{f^{\prime \prime}(a)}{2!}(x-a)^{2}+ \cdots + \frac{f^{(n)}(a)}{n!}(x-a)^{n}= \sum\limits_{k=0}^{\infty} \frac{f^{(k)}(a)}{k!}(x-a)^k$$
- ๋ชจ๋ $x$์ ๋ํด์ ์ข์ฐ๋ณ์ด ๊ฐ์ง๋ ์์. $x=a$์ ๊ฐ๊น์ธ ์๋ก ์ ํํ๊ณ , ๊ณ ์ฐจํญ์ด ๋ง์ ์๋ก ์ ํํจ.
- $x=a$์์ $f(x)$์ ๊ฐ์ ๋ฏธ๋ถ๊ณ์๋ฅผ ๊ฐ๋ ๋คํญ์์ผ๋ก ๊ทผ์ฌํ๋ ๋ฐฉ์
- ์ฐจ์๊ฐ ์ฌ๋ผ๊ฐ๋ฉด ๋ ๊ธด ๊ตฌ๊ฐ์์ ์๋ณธ๊ณผ ์ ์ฌํด์ง๊ณ , ์ฐจ์๊ฐ ๋ฎ์์ง๋ฉด $x=a$์ ๊ฐ๊น์ด ๊ตฌ๊ฐ์์๋ง ์ ์ฌํจ
๋ค๋ณ์ํจ์
$$f(x+v) = f(x) + f^{\prime}(x)v+ \frac12 f^{\prime \prime }(x)v^{2} + \cdots$$
- ์์ ๊ฐ์ ์์ด ํ ์ผ๋ฌ ๊ธ์ ๊ธฐ๋ณธ
- $f: R^{n} \to R$ ์ด๊ณ , $x = (x_{1}, x_{2}, x_{3}, \cdots)$ , $v=(v_{1}, v_{2}, v_{3}, \cdots)$ ์ด๋ผ๊ณ ํ๋ฉด
- $f^{\prime}(x) = \bigtriangledown f(x)$ : gradient descent
- $f^{\prime \prime}(x) = H_{f}$ : ํค์ธ(Hessian) ํ๋ ฌ
- $x$ ์ $v$๊ฐ ํ๋ ฌ์ผ ๋, ๊ฐ ๋ณ์์ ํธ๋ฏธ๋ถ์ ์ ์ฉํด์ ์ ํ
์ผ๋ฌ ๊ธ์ ์์ ๋ค์ ์จ๋ณด๋ฉด
$$f'(x)v = \nabla f(x)^{T} v$$ $$f''(x)v^{2}= v^{T}H_{f}v$$ $$\nabla f(x) = (\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}},\cdots, \frac{\partial f}{\partial x_{n}})^T$$ $$H_{f}= \begin{bmatrix} \frac{\partial^{2}f}{\partial x_{1}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{1}\partial x_{2}} & \cdots & \frac{\partial^{2}f}{\partial x_{1}\partial x_{n}} \
\vdots & & & \vdots \
\frac{\partial^{2}f}{\partial x_{n}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{n}\partial x_{2}} & \cdots & \frac{\partial^{2}f}{\partial x_{n}\partial x_{n}} \end{bmatrix}$$ $$f(x+v) = f(x) + \nabla f(x)^{T}v + \frac12 v^{T}H_{f}v + \cdots$$
2๋ณ์ ํจ์
- $x=(x_1, x_2)$, $v=(v_{1}, v_{2})$ ๋ผ๋ฉด
$$\nabla f(x) = (\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}})$$ $$H_{f}= \begin{bmatrix} \frac{\partial^{2}f}{\partial x_{1}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{1}\partial x_{2}} \ \frac{\partial^{2}f}{\partial x_{2}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{2}\partial x_{2}} \end{bmatrix} $$
- ์ด ๋ํจ์๋ค์ ๊ธฐ๋ฐ์ผ๋ก $f(x_{1}+ v_{1}, x_{2} + v_{2})$ ๊ตฌํด๋ณด๋ฉด
$$f(x_{1}+v_{1}, x_{2}+ v_{2}) = f(x_{1} , x_{2}) \frac{\partial f(x)}{\partial x_{1}}v_{1} \frac{\partial f(x)}{\partial x_{2}}v_{2} \frac12 \frac{\partial^{2} f(x)}{\partial x_{1}^{2}}v_{1}^{2} \frac{\partial^{2} f(x)}{\partial x_{1}x_{2}}v_{1}v_{2} \frac12 \frac{\partial^{2} f(x)}{\partial x_{2}^{2}}v_{2}^{2} \cdots$$
LQR์ ํด๋น ๋ด์ฉ์ ์ ์ฉํด์ ํ ์ผ๋ฌ ๊ธ์๋ก ์์ ๋ฐ๊ฟ๋ณด๊ณ , Quadratic form์ผ๋ก ๋ณํํ๋ ๊ณผ์ ์ ๋ค์ ํฌ์คํธ์..
728x90
'๐ฌ ML & Data > ๐ฎ Reinforcement Learning' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[๊ฐํํ์ต] TRPO(Trust Region Policy Optimization) ๋ ผ๋ฌธ ์ ๋ฆฌ (8) | 2024.09.02 |
---|---|
[MPC] 4. Optimal Control(2) - Taylor Series ์ ์ฉ, Algebraic Riccati Equation(ARE) ๊ตฌํ๊ธฐ (1) | 2024.03.08 |
[MPC] 3. ์ํ(state)์ ์ถ๋ ฅ(output) ์์ธกํด๋ณด๊ธฐ (0) | 2024.03.06 |
[MPC] 2. ์ํ ๊ณต๊ฐ ๋ฐฉ์ ์ ์ ๋ (0) | 2024.03.06 |
[MPC] 1. Model Predictive Control Intro (0) | 2024.03.06 |