๐Ÿฌ ML & Data

    [๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹] n. Backpropagation ์ˆ˜์‹ ํ’€์ด ๋ฐ ๊ฒ€์ฆ

    ์ถœ์ฒ˜: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ A Step by Step Backpropagation Example Background Backpropagation is a common method for training a neural network. There is no shortage of papers online that attempt to explain how backpropagation works, but few that include an example… mattmazur.com feed forward ๊ณ„์‚ฐ 1. h1 ๊ตฌํ•˜๊ธฐ $$net_{h1} = 0.05 * 0.15 + 0.1 * 0.2 + 0.35..

    [MPC] 4. Optimal Control(2) - Taylor Series ์ ์šฉ, Algebraic Riccati Equation(ARE) ๊ตฌํ•˜๊ธฐ

    LQR์— ์ ์šฉ $$V^{*}(x(t), t) = \underset{u[t, t+\Delta t]}{min} \{ \Delta t \cdot l[x(t + \alpha \Delta t), u(t + \alpha \Delta t), t + \alpha \Delta t] + V^{*}(x(t + \Delta t), t+\Delta t) \}$$ ์ด ์‹์—์„œ $V^{*}(x(t + \Delta t), t+\Delta t)$ ๋ถ€๋ถ„์„ ์œ„ Taylor Series๋กœ x์™€ t์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌํ•ด๋ณด์ž. $x = (x(t), t), v = \Delta t$ ๋ผ๊ณ  ์ƒ๊ฐํ•˜์ž. ์ •๋ฆฌํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค. $$V^{*}(x + v) = V^{*}(x) + f'(x)v + f(x)v' + \frac 12 f''(x)v^{2}+ \frac1..

    [MPC] 4. Optimal Control(1) - LQR๊ณผ Taylor Series(ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜)

    optimal control ๊ธฐ์ดˆ - LQR(Linear Quadratic Regulator) LQR์ด ๊ธฐ์ดˆ๋ผ์„œ ์š”๊ฑธ๋กœ system : $\dot x = f(x, u, t), x(t_{0}) = x_{0}$ cost function : $$V(x(t_{0}), u, t_{0}) = \int_{t_{0}}^{T} l[x(\tau), u(\tau), \tau]d\tau + m(x(T))$$ ์œ„ cost function์„ ์ตœ์†Œํ™”ํ•˜๋Š” ์ž…๋ ฅ $u^{*}(t), t_{0}\le t \le T$ ์ฐพ๊ธฐ -> optimal control์˜ ๋ชฉ์  principle of optimality ์— ๋”ฐ๋ผ ํ•œ ํ•ด๊ฐ€ ์ตœ์ ์ด๋ฉด sub problem์˜ ํ•ด๋„ ์ตœ์ ์ด ๋œ๋‹ค. $t_{0} < t < t_{1} < T$ ๋กœ $t_{1}$ ์ถ”๊ฐ€..

    [MPC] 3. ์ƒํƒœ(state)์™€ ์ถœ๋ ฅ(output) ์˜ˆ์ธกํ•ด๋ณด๊ธฐ

    Input / Output ์ •๋ฆฌ $N_p$ : ์˜ˆ์ธกํ•˜๋ ค๋Š” ๋ฏธ๋ž˜ ์ถœ๋ ฅ ์ˆ˜ $N_c$ : ์˜ˆ์ธกํ•˜๋ ค๋Š” ๋ฏธ๋ž˜ ์ œ์–ด์ž…๋ ฅ ์ˆ˜ ๊ฒฝ๋กœ ์ถ”์ ์˜ ๊ฒฝ์šฐ, $N_p$๊ฐœ ์ ์„ tracking ํ•˜๊ธฐ ์œ„ํ•œ $N_c$๊ฐœ ์ œ์–ด ๋ช…๋ น... Control Input $\Delta u(k), \Delta u(k+1), \Delta u(k+2), \cdots, \Delta u(k + N_{c} - 1)$ Output $y(k), y(k+1), \cdots, y(k+N_{p})$ $y(k) = Cx(k)$ ์ด๋ฏ€๋กœ $y(k+1) = Cx(k+1), y(k+2) = Cx(k+2), \cdots$ ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ ๋”ฐ๋ผ์„œ ์˜ˆ์ธก state $x(k+1), x(k+2), \cdots, x(k+N_{p})$๋ฅผ ๊ตฌํ•˜๋ฉด ๋จ State variable ๊ตฌํ•˜๊ธฐ $..

    [MPC] 2. ์ƒํƒœ ๊ณต๊ฐ„ ๋ฐฉ์ •์‹ ์œ ๋„

    MPC ์ƒํƒœ ๊ณต๊ฐ„ ๋ฐฉ์ •์‹ ์œ ๋„ ์ƒํƒœ๊ณต๊ฐ• ๋ฐฉ์ •์‹ + LTI(Linear TimeINvariant, ์„ ํ˜• ์‹œ๊ฐ„ ๋ถˆ๋ณ€ ์‹œ์Šคํ…œ)์˜ ๊ฒฝ์šฐ => Continuous-time state-space model ์ƒํƒœ ๋ฐฉ์ •์‹ : $$\bar{x} = Ax + Bu$$ ์ถœ๋ ฅ ๋ฐฉ์ •์‹ : $$y = Cx$$ MPC๋Š” discrete ํ•œ ํ™˜๊ฒฝ => Discrete-time state-space model ์ƒํƒœ ๋ฐฉ์ •์‹ : $$x(k+1) = A_{d}x(k) + B_{d}u(k)$$ ์ถœ๋ ฅ ๋ฐฉ์ •์‹ : $$y(k) = C_{d}x(k)$$ MPC ๊ธฐ๋ณธ ๋ชจ๋ธ์€ Discrete-time aumented state-space model ์ƒํƒœ ๋ณ€์ˆ˜ ๋Œ€์‹  ์ƒํƒœ ๋ณ€์ˆ˜์˜ ๋ณ€ํ™”๋Ÿ‰ $\Delta x$ ์‚ฌ์šฉ ์ƒํƒœ ๋ฐฉ์ •์‹ $${x(k+1) - x(k) ..

    [MPC] 1. Model Predictive Control Intro

    ์œ ํŠœ๋ธŒ https://www.youtube.com/watch?v=zU9DxmNZ1ng&list=PLSAJDR2d_AUtkWiO_U-p-4VpnXGIorrO-&index=1 ๋ธ”๋กœ๊ทธ https://sunggoo.tistory.com/65 ์œ„ ์ž๋ฃŒ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ๊ฐ€๋ณ๊ฒŒ ์ •๋ฆฌํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜์‹ ์ฆ๋ช…์ด ๋งŽ๊ฒ ๊ณ , ๊ทธ ๋’ค๋กœ๋Š” ๋ชฉ์ ์— ๋”ฐ๋ผ ๋…ผ๋ฌธ์ด๋‚˜ ์ฝ”๋“œ ๊ตฌํ˜„์„ ๋ณด๋ฉด์„œ ์ถ”๊ฐ€ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. MPC(Model Predictive Control)์˜ ์ปจ์…‰ ๊ธฐ๊ธฐ ์ƒํƒœ ๋ณ€ํ™”(dynamics) + ์ฃผ๋ณ€ ํ™˜๊ฒฝ ์š”์†Œ => cost function ์ œ์–ด๊ณตํ•™ ๋น„์„ ํ˜• / ๋น„๋ณผ๋ก(Non-linear, Non-convex) ๋Œ€์ƒ ๊ณต๋ถ€ํ•˜๋ฉด์„œ ๋Š๋ผ๊ธฐ์—๋Š” ๊ฐ•ํ™”ํ•™์Šต์˜ ํ–ฅ๊ธฐ๊ฐ€ ์ข€ ์žˆ์Œ Flow k-1 ์ผ ๋•Œ์˜ ์ƒํƒœ ๋ณ€์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ k+1 ~ ..

    [Math] Mathematics for Machine Learning 2. Linear Algebra

    ๊ทผ๋ž˜์— ์ •๋ง์ด์ง€ ์ˆ˜ํ•™ ๊ณต๋ถ€์˜ ํ•„์š”์„ฑ์„ ๋Š๊ปด์„œ MML ์ด๋ผ๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ์ˆ˜ํ•™์˜ ๋ฐ”์ด๋ธ” ๊ฐ™์€ ์ฑ…์œผ๋กœ ๊ณต๋ถ€๋ฅผ ์‹œ์ž‘ํ–ˆ๋Š”๋ฐ... ์ผ๋‹จ ์˜์–ด๊ณ (!), ์šฉ์–ด๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๊ณ (!), ๋‚ด์šฉ๋„ ์–ด๋ ค์›Œ์„œ ์•„์ฃผ ์• ๋ฅผ ๋จน๊ณ  ์žˆ๋‹ค. ์–ด์ฐŒ์ €์ฐŒ ์ดํ•ดํ–ˆ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ ์—ฐ์Šต๋ฌธ์ œ๋ฅผ ๋ณด๋‹ˆ๊นŒ ๋˜ ์ด์•ผ~ ๋ชจ๋ฅด๊ฒ ๊ณ  ๋‚œ๋ฆฌ๋‹ค... ๋‹ต์•ˆ์ง€๋ฅผ ๋ด๋„ ์ดํ•ด๊ฐ€ ์–ด๋ ค์šด ๋ถ€๋ถ„์ด ๋งŽ์•„์„œ ๊ผผ๊ผผํ•˜๊ฒŒ ๊ฐ€์ด๋“œ ๋”ฐ๋ผ ๋‘์„ธ๋ฒˆ ํ’€์–ด๋ด์•ผ ์ดํ•ด๊ฐ€ ๋˜์ง€ ์‹ถ๋‹ค. ๊ทผ๋ฐ ๋„ˆ๋ฌด ์–ด๋ ต๋‹ค ใ…Žใ…‹... ์„ ํ˜•๋Œ€์ˆ˜ ๊ฐ•์˜๋ฅผ ์ˆ˜๊ฐ•ํ–ˆ์—ˆ๋Š”๋ฐ๋„ ๋‚ด๊ฐ€ ๋“ค์—ˆ๋˜ ์„ ํ˜•๋Œ€์ˆ˜ ๊ฐ•์˜์˜ ๋ฒ”์œ„๋ณด๋‹ค ๋” ๋„“์€ ๋“ฏ ํ•˜๋‹ค. ์•„๋ฌดํŠผ ์•„๋ž˜ ๋งํฌ๋Š” ์ฐธ๊ณ ํ•œ ์‚ฌ์ดํŠธ ๋“ฑ. ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ ํ•ด์ฃผ์‹  ์ค€๋ณ„๋‹˜ ์ •๋ง ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค... ๋น„๊ตํ•˜๋ฉฐ ๋ณด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค... ๊ต์žฌ - pdf ๋ฌด๋ฃŒ ๊ณต๊ฐœ(https://mml-book.github.io/book/mml-boo..

    [๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹] 1. ๋„“์€ ์‹œ๊ฐ์œผ๋กœ ๋ณด๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐœ๊ด„

    2022๋…„ 11์›” Chat GPT๊ฐ€ ๋Œ€์ค‘์ ์œผ๋กœ ๊ต‰์žฅํžˆ ๋„“๊ฒŒ ์•Œ๋ ค์ง€๋ฉด์„œ ์„œ์„œํžˆ ๋ถ์ด ์˜ค๊ณ  ์žˆ๋˜ ์ธ๊ณต์ง€๋Šฅ ์‹œ์žฅ์ด ๊ทธ์•ผ๋ง๋กœ ์ „์„ฑ๊ธฐ๋ฅผ ๋งž์ดํ–ˆ๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“œ๋Š” ์š”์ฆ˜์ž…๋‹ˆ๋‹ค. LLM(Large Language Model) ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ CV(Computer Vision) ๋ถ„์•ผ์—์„œ๋Š” ์ €์ž‘๊ถŒ ๋ฌธ์ œ๊ฐ€ ๋Œ€๋‘๋˜๊ณ  ์žˆ๊ธฐ๋Š” ํ•˜์ง€๋งŒ ์‚ฌ์ง„๊ณผ ๊ทธ๋ฆผ์ฒด๋ฅผ ํ•™์Šต์‹œ์ผœ ๊ทธ๋ฆผ์ฒด๋ฅผ ์ž…์€ ์ƒˆ๋กœ์šด ๊ทธ๋ฆผ์„ ๋งŒ๋“ค์–ด๋‚ด๊ณ , ์Œ์„ฑํ•ฉ์„ฑ ๋ถ„์•ผ์—์„œ๋Š” ์ธ๊ณต์ง€๋Šฅ์„ ํ™œ์šฉํ•ด TTS๊ฐ€ ๋…ธ๋ž˜๋ฅผ ๋ถ€๋ฅด๊ฒŒ ํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ๋ˆˆ์— ๋ณด์ด๋Š” ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ์œ„์™€ ๊ฐ™์€ ๋ถ„์•ผ๋ฅผ ์ œ์™ธํ•˜๊ณ ๋„ ์ธ๊ณต์ง€๋Šฅ์„ ํ†ตํ•œ ์ด์ƒํƒ์ง€ ์†”๋ฃจ์…˜, ๊ฐ•ํ™”ํ•™์Šต์„ ํ™œ์šฉํ•œ ๊ฒŒ์ž„ ๋ด‡(Bot) ์ƒ์„ฑ ๋“ฑ ์•„์ง ์ €๋„ ์™„๋ฒฝํžˆ ์“ฐ์ž„์„ ๋‹ค ์•Œ์ง€ ๋ชปํ•˜๋Š” ๋ฌด๊ถ๋ฌด์ง„ํ•œ ๋ถ„์•ผ์—์„œ ๋”ฅ๋Ÿฌ๋‹์ด ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” AI๋ฅผ ๊ณต๋ถ€ํ•˜๊ธฐ๋กœ ..

    [๊ฐ•ํ™”ํ•™์Šต] Dealing with Sparse Reward Environments - ํฌ๋ฐ•ํ•œ ๋ณด์ƒ ํ™˜๊ฒฝ์—์„œ ํ•™์Šตํ•˜๊ธฐ

    โ€ป ์•„๋ž˜ ๋งํฌ์˜ ๋‚ด์šฉ์„ ๊ณต๋ถ€ํ•˜๋ฉฐ ํ•œ๊ตญ์–ด๋กœ ์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. Reinforcement Learning: Dealing with Sparse Reward Environments Reinforcement Learning (RL) is a method of machine learning in which an agent learns a strategy through interactions with its environment… medium.com 1. Sparse Reward Sparse Reward(ํฌ๋ฐ•ํ•œ ๋ณด์ƒ) : Agent๊ฐ€ ๋ชฉํ‘œ ์ƒํ™ฉ์— ๊ฐ€๊นŒ์›Œ์กŒ์„ ๋•Œ๋งŒ ๊ธ์ • ๋ณด์ƒ์„ ๋ฐ›๋Š” ๊ฒฝ์šฐ ํ˜„์žฌ ์‹คํ—˜ ํ™˜๊ฒฝ ์„ธํŒ…๊ณผ ๊ฐ™์Œ Curiosity-Driven method agent๊ฐ€ ๊ด€์‹ฌ์‚ฌ ๋ฐ–์˜ ํ™˜๊ฒฝ์—๋„ ๋™๊ธฐ๋ฅผ ๋ฐ›๋„๋ก Curric..

    [๊ฐ•ํ™”ํ•™์Šต] DDPG(Deep Deterministic Policy Gradient)

    DQN์˜ ์ฐจ์›์˜ ์ €์ฃผ ๋ฌธ์ œ(๊ณ ์ฐจ์› action์„ ๋‹ค๋ฃจ๋Š” ๊ฒฝ์šฐ ์—ฐ์‚ฐ ์†๋„๊ฐ€ ๋Š๋ ค์ง€๊ณ  memory space๋ฅผ ๋งŽ์ด ์š”ํ•จ)๋ฅผ off-policy actor critic ๋ฐฉ์‹์œผ๋กœ ํ’€์–ด๋‚ธ๋‹ค. ๊ธฐ์กด DQN ๋ฐฉ์‹์˜ insight๋“ค์— batch normalization replay buffer target Q network Actor-critic ํŒŒ๋ผ๋ฏธํ„ฐํ™” ๋œ actor function์„ ๊ฐ€์ง actor function : state์—์„œ ํŠน์ • action์œผ๋กœ mappingํ•˜์—ฌ ํ˜„์žฌ policy๋ฅผ ์ง€์ • policy gradient ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต ์—ฌ๊ธฐ์—์„œ J๊ฐ€ Objective Function(๋ชฉํ‘œํ•จ์ˆ˜) actor function์ด ๋ชฉํ‘œ ํ•จ์ˆ˜๋ฅผ gradient asent๋กœ ์ตœ๋Œ€ํ™”→ ์ด ๋•Œ์˜ policy parameter..