[Paper Review] Transformer - Attention is All You Need
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
๋ถ„๋ช… ๋‚ด๊ฐ€ ๋”ฅ๋Ÿฌ๋‹์— ์ž…๋ฌธํ–ˆ์„ ๋•Œ๋„ Transformer๊ฐ€ ๊ฐ๊ด‘๋ฐ›๊ณ  ์žˆ์—ˆ๋Š”๋ฐ ์•„์ง๊นŒ์ง€๋„ Transformer๊ฐ€ ๊ฐ๊ด‘๋ฐ›๊ณ  ์žˆ๋‹ค. ์„ธ์ƒ์€ ๋ณ€ํ•˜๋Š”๋ฐ... ์ฐธ ์‘์šฉ๋„๋„ ๋†’๊ณ  ์„ฑ๋Šฅ์ด ์ข‹์€ ๋ชจ๋ธ์ž„์— ํ‹€๋ฆผ์—†๋‹ค.์˜ˆ์ „์— ๊ณต๋ถ€ํ•ด๋ณธ๋ฐ” ์žˆ์ง€๋งŒ ๊ฒฝํ—˜์น˜๊ฐ€ ์Œ“์ธ ์ง€๊ธˆ ์ข€ ๋” ์ง€์‹์„ ๊ณต๊ณ ํžˆํ•  ๊ฒธ ์ •๋ฆฌํ•ด๋ดค๋‹ค. ๊ธ€๋กœ ์˜ฎ๊ธฐ๊ธฐ ๊ท€์ฐฎ์•„์„œ ์†๊ธ€์”จ ์‚ฌ์ง„์œผ๋กœ ๋Œ€์ฒดํ•œ๋‹ค. 1. Attention ๊ฐœ๋… ์ดํ•ดํ•˜๊ธฐ2. Self-Attention์ด๋ž€?3. Transformer ๊ตฌ์กฐ4. Transformer์˜ Query, Key, Value ๊ตฌํ•ด๋ณด๊ธฐ5. Multi-head Attention6. Encoder์™€ Decoder์—์„œ Self-Attention์˜ ๋™์ž‘7. Masked Self Attention8. ๊ธฐํƒ€ ๊ฐœ๋… ๋ฐ ๊ธฐ๋ฒ•8.1. Feed Forward8.2..
[Paper Review] Mamba - Linear Time Sequence Modeling with Selective State Spaces 2
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
3. Selective State Space Models3.1 Selection as a Means of Compression๋ณ‘ํ•ฉ ์ž‘์—…์— ๊ด€ํ•œ ๋‘๊ฐ€์ง€ ์‹คํ–‰ ์˜ˆ์‹œSelective Copying : ๊ธฐ์–ตํ•  ํ† ํฐ์˜ ์œ„์น˜๋ฅผ ๋ฐ”๊ฟ” Copying Task๋ฅผ ์ˆ˜์ •ํ•œ๋‹ค. ๊ด€๋ จ์žˆ๋Š” ํ† ํฐ์„ ๊ธฐ์–ตํ•˜๊ณ  ๊ด€๋ จ์—†๋Š” ํ† ํฐ์„ ๊ฑธ๋Ÿฌ๋‚ด๋ ค๋ฉด ๋‚ด์šฉ ์ธ์‹ ์ถ”๋ก (content-aware resoning)์ด ํ•„์š”ํ•˜๋‹ค.Induction Heads : ์ ์ ˆํ•œ ์ปจํ…์ŠคํŠธ์—์„œ ์ถœ๋ ฅ์„ ๋‚ผ ์‹œ๊ธฐ๋ฅผ ์•Œ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‚ด์šฉ ์ธ์‹ ์ถ”๋ก ์ด ํ•„์š”ํ•˜๋‹ค. LLM์˜ ๋™์ž‘ ๊ณผ์ • ์„ค๋ช…์„ ์œ„ํ•ด ๊ฐ€์žฅ ๋งŽ์ด ์“ฐ์ด๋Š” ๋งค์ปค๋‹ˆ์ฆ˜.์ด ์ž‘์—…์€ LTI ๋ชจ๋ธ์˜ ์‹คํŒจํ•œ ๋ชจ๋“œ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ํšŒ๊ท€์  ๊ด€์ ์—์„œ constant dynamics(์—ฌ๊ธฐ์—์„œ๋Š” $\bar{A}, \bar{B}$)๋Š” context์—..
[Paper Review] Mamba - Linear Time Sequence Modeling with Selective State Spaces 1
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
๋‚˜์˜จ์ง€ ๋ฒŒ์จ 1๋…„๋„ ๋„˜์—ˆ์ง€๋งŒ ์ตœ์‹  ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ์•ˆ ํ•œ์ง€๊ฐ€ ๋ฐฑ๋งŒ๋…„ ์ •๋„ ๋œ ๊ฒƒ ๊ฐ™์•„์„œ ํ•œ ๋ฒˆ ์ฝ์–ด๋ณด๋Š” mamba... ๊ทธ๋ฆฌ ์ •ํ™•ํ•œ ๋ฆฌ๋ทฐ๋Š” ์•„๋‹ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์‹ค ๋ฒˆ์—ญ์— ๊ฐ€๊น๊ณ  ์ข€ ๋” ์ดํ•ดํ•ด๋ณด๋ฉด์„œ ๋‚ด์šฉ ์ˆ˜์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. 1. Introduction์ตœ๊ทผ๋“ค์–ด Structured State Space Sequence Models(SSMs) ๊ฐ€ ์‹œํ€€์Šค ๋ชจ๋ธ๋ง ๊ตฌ์กฐ ๋ถ„์•ผ์—์„œ ์œ ๋งํ•œ ํด๋ž˜์Šค๋กœ ๋“ฑ์žฅํ–ˆ๋‹ค. ์ „ํ†ต์ ์ธ state space models์— ์˜๊ฐ์„ ๋ฐ›์•„ CNN๊ณผ RNN์˜ ํ†ตํ•ฉ์„ ์กฐ์œจํ•œ๋‹ค(interpreted).Mamba์—์„œ๋Š” selective state space model์˜ ์ƒˆ๋กœ์šด ์ข…๋ฅ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์‹œํ€„์Šค ๊ธธ์ด์— ๋”ฐ๋ผ ์„ ํ˜•์ ์œผ๋กœ ํ™•์žฅํ•˜๋ฉด์„œ transformer์˜ ๋ชจ๋ธ๋ง ํŒŒ์›Œ๋ฅผ ๋”ฐ๋ผ์žก๊ธฐ ์œ„ํ•ด์„œ ๋ช‡๋ช‡์˜ axes(์—ฌ..
[๊ฐ•ํ™”ํ•™์Šต] TRPO(Trust Region Policy Optimization) ๋…ผ๋ฌธ ์ •๋ฆฌ
ยท
๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
PPO๋ฅผ ๊ณต๋ถ€ํ•˜๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ์ด ๋…ผ๋ฌธ์ด ์„ ํ–‰๋˜์–ด์•ผํ•œ๋‹ค๋Š” ์ด์•ผ๊ธฐ๋ฅผ ๋“ค์–ด์„œ ๊ฐ€๋ณ๊ฒŒ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ดค๋‹ค. ์•„์ง ๊ฐ•ํ™”ํ•™์Šต ๋…ผ๋ฌธ ์ฝ๋Š” ๊ฑด ์ต์ˆ™ํ•˜์ง€ ์•Š์•„์„œ ์‹œ๊ฐ„์ด ๊ฝค ๊ฑธ๋ ธ๋‹ค. ์ˆ˜ํ•™์  ๊ฐœ๋…์ด ์ ์–ด์„œ ์ตœ๋Œ€ํ•œ ๊ผผ๊ผผํžˆ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ์ •๋ฆฌํ•ด๋ดค๋Š”๋ฐ, ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ๋„ ๋„์›€์ด ๋˜์—ˆ์œผ๋ฉด ํ•ด์„œ ํฌ์ŠคํŒ…ํ•œ๋‹ค.[https://arxiv.org/abs/1502.05477]TRPO(Trust Region Policy Optimization)url: https://arxiv.org/abs/1502.05477title: "Trust Region Policy Optimization"description: "We describe an iterative procedure for optimizing policies, with guaranteed mono..
[Model Compression] ๋ชจ๋ธ ์–‘์žํ™”(Model Optimization) with Tensorflow
ยท
๐Ÿฌ ML & Data/โ” Q & etc.
๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ๊ฒฝ๋Ÿ‰ํ™”ํ•˜๋Š” ๊ฒƒ์€ ๋ชจ๋ธ ํ•™์Šต ์ดํ›„ ์‹ค์ œ ๋ฌธ์ œ์— ๋”ฅ๋Ÿฌ๋‹ ํ•ด๋ฒ•์„ ์ ์šฉํ•˜๋Š” ๊ณผ์ •์— ์žˆ์–ด์„œ ์‹คํ–‰ ์‹œ๊ฐ„, ์˜ˆ์ธก์— ํ•„์š”ํ•œ ๋ฆฌ์†Œ์Šค ์†Œ๋ชจ๋Ÿ‰์„ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ ํ•„์š”ํ•œ ๊ณผ์ •์ด๋‹ค.๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™”์—๋Š” (๋‚ด๊ฐ€ ์•Œ๊ณ  ์žˆ๊ธฐ๋กœ๋Š”) ์„ธ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋Š”๋ฐ,๋ชจ๋ธ ์–‘์žํ™”(๋น„ํŠธ ์ˆ˜๋ฅผ ์ค„์ด๋Š” ๋ฐฉ์‹)๋ชจ๋ธ pruning(์ค‘์š”ํ•˜์ง€ ์•Š์€ ๋ถ€๋ถ„์„ ๋ฒ„๋ฆฌ๋Š” ๋ฐฉ์‹)๊ทธ๋ƒฅ ๋ชจ๋ธ ์„ค๊ณ„๋ฅผ ์ž˜ํ•˜๊ธฐ์ค‘์— ์ด๋ฏธ ํ•™์Šตํ•œ ๋ชจ๋ธ์— ์žˆ์–ด์„œ ๊ฐ€์žฅ ์‰ฌ์šด ์–‘์žํ™”๋ฅผ ์šฐ์„  ํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ•˜์˜€๋‹ค. Tensorflow ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ ์–‘์žํ™”ํ•˜๋Š” ์˜ˆ์ œ๋ฅผ ๊ธฐ๋กํ•ด๋‘”๋‹ค.Tensorflow๋กœ ๊ตฌ์„ฑ๋˜์–ด ํ•™์Šตํ•˜๊ณ  ๊ฐ€์ค‘์น˜๋ฅผ .h5 ํ™•์žฅ์ž๋กœ ์ €์žฅํ•œ ๋ชจ๋ธ1. ๋ชจ๋ธ ์–‘์žํ™”a. ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐimport tensorflow as tfmodel = your_model(parameter)model.load_wei..
[๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹] n. Backpropagation ์ˆ˜์‹ ํ’€์ด ๋ฐ ๊ฒ€์ฆ
ยท
๐Ÿฌ ML & Data/๐Ÿฆ„ ๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹
์ถœ์ฒ˜: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ A Step by Step Backpropagation Example Background Backpropagation is a common method for training a neural network. There is no shortage of papers online that attempt to explain how backpropagation works, but few that include an example… mattmazur.com feed forward ๊ณ„์‚ฐ 1. h1 ๊ตฌํ•˜๊ธฐ $$net_{h1} = 0.05 * 0.15 + 0.1 * 0.2 + 0.35..
[MPC] 4. Optimal Control(2) - Taylor Series ์ ์šฉ, Algebraic Riccati Equation(ARE) ๊ตฌํ•˜๊ธฐ
ยท
๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
LQR์— ์ ์šฉ $$V^{*}(x(t), t) = \underset{u[t, t+\Delta t]}{min} \{ \Delta t \cdot l[x(t + \alpha \Delta t), u(t + \alpha \Delta t), t + \alpha \Delta t] + V^{*}(x(t + \Delta t), t+\Delta t) \}$$ ์ด ์‹์—์„œ $V^{*}(x(t + \Delta t), t+\Delta t)$ ๋ถ€๋ถ„์„ ์œ„ Taylor Series๋กœ x์™€ t์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌํ•ด๋ณด์ž. $x = (x(t), t), v = \Delta t$ ๋ผ๊ณ  ์ƒ๊ฐํ•˜์ž. ์ •๋ฆฌํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค. $$V^{*}(x + v) = V^{*}(x) + f'(x)v + f(x)v' + \frac 12 f''(x)v^{2}+ \frac1..
[MPC] 4. Optimal Control(1) - LQR๊ณผ Taylor Series(ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜)
ยท
๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
optimal control ๊ธฐ์ดˆ - LQR(Linear Quadratic Regulator) LQR์ด ๊ธฐ์ดˆ๋ผ์„œ ์š”๊ฑธ๋กœ system : $\dot x = f(x, u, t), x(t_{0}) = x_{0}$ cost function : $$V(x(t_{0}), u, t_{0}) = \int_{t_{0}}^{T} l[x(\tau), u(\tau), \tau]d\tau + m(x(T))$$ ์œ„ cost function์„ ์ตœ์†Œํ™”ํ•˜๋Š” ์ž…๋ ฅ $u^{*}(t), t_{0}\le t \le T$ ์ฐพ๊ธฐ -> optimal control์˜ ๋ชฉ์  principle of optimality ์— ๋”ฐ๋ผ ํ•œ ํ•ด๊ฐ€ ์ตœ์ ์ด๋ฉด sub problem์˜ ํ•ด๋„ ์ตœ์ ์ด ๋œ๋‹ค. $t_{0} < t < t_{1} < T$ ๋กœ $t_{1}$ ์ถ”๊ฐ€..
[MPC] 3. ์ƒํƒœ(state)์™€ ์ถœ๋ ฅ(output) ์˜ˆ์ธกํ•ด๋ณด๊ธฐ
ยท
๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
Input / Output ์ •๋ฆฌ $N_p$ : ์˜ˆ์ธกํ•˜๋ ค๋Š” ๋ฏธ๋ž˜ ์ถœ๋ ฅ ์ˆ˜ $N_c$ : ์˜ˆ์ธกํ•˜๋ ค๋Š” ๋ฏธ๋ž˜ ์ œ์–ด์ž…๋ ฅ ์ˆ˜ ๊ฒฝ๋กœ ์ถ”์ ์˜ ๊ฒฝ์šฐ, $N_p$๊ฐœ ์ ์„ tracking ํ•˜๊ธฐ ์œ„ํ•œ $N_c$๊ฐœ ์ œ์–ด ๋ช…๋ น... Control Input $\Delta u(k), \Delta u(k+1), \Delta u(k+2), \cdots, \Delta u(k + N_{c} - 1)$ Output $y(k), y(k+1), \cdots, y(k+N_{p})$ $y(k) = Cx(k)$ ์ด๋ฏ€๋กœ $y(k+1) = Cx(k+1), y(k+2) = Cx(k+2), \cdots$ ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ ๋”ฐ๋ผ์„œ ์˜ˆ์ธก state $x(k+1), x(k+2), \cdots, x(k+N_{p})$๋ฅผ ๊ตฌํ•˜๋ฉด ๋จ State variable ๊ตฌํ•˜๊ธฐ $..
[MPC] 2. ์ƒํƒœ ๊ณต๊ฐ„ ๋ฐฉ์ •์‹ ์œ ๋„
ยท
๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
MPC ์ƒํƒœ ๊ณต๊ฐ„ ๋ฐฉ์ •์‹ ์œ ๋„ ์ƒํƒœ๊ณต๊ฐ• ๋ฐฉ์ •์‹ + LTI(Linear TimeINvariant, ์„ ํ˜• ์‹œ๊ฐ„ ๋ถˆ๋ณ€ ์‹œ์Šคํ…œ)์˜ ๊ฒฝ์šฐ => Continuous-time state-space model ์ƒํƒœ ๋ฐฉ์ •์‹ : $$\bar{x} = Ax + Bu$$ ์ถœ๋ ฅ ๋ฐฉ์ •์‹ : $$y = Cx$$ MPC๋Š” discrete ํ•œ ํ™˜๊ฒฝ => Discrete-time state-space model ์ƒํƒœ ๋ฐฉ์ •์‹ : $$x(k+1) = A_{d}x(k) + B_{d}u(k)$$ ์ถœ๋ ฅ ๋ฐฉ์ •์‹ : $$y(k) = C_{d}x(k)$$ MPC ๊ธฐ๋ณธ ๋ชจ๋ธ์€ Discrete-time aumented state-space model ์ƒํƒœ ๋ณ€์ˆ˜ ๋Œ€์‹  ์ƒํƒœ ๋ณ€์ˆ˜์˜ ๋ณ€ํ™”๋Ÿ‰ $\Delta x$ ์‚ฌ์šฉ ์ƒํƒœ ๋ฐฉ์ •์‹ $${x(k+1) - x(k) ..
[MPC] 1. Model Predictive Control Intro
ยท
๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
์œ ํŠœ๋ธŒ https://www.youtube.com/watch?v=zU9DxmNZ1ng&list=PLSAJDR2d_AUtkWiO_U-p-4VpnXGIorrO-&index=1 ๋ธ”๋กœ๊ทธ https://sunggoo.tistory.com/65 ์œ„ ์ž๋ฃŒ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ๊ฐ€๋ณ๊ฒŒ ์ •๋ฆฌํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜์‹ ์ฆ๋ช…์ด ๋งŽ๊ฒ ๊ณ , ๊ทธ ๋’ค๋กœ๋Š” ๋ชฉ์ ์— ๋”ฐ๋ผ ๋…ผ๋ฌธ์ด๋‚˜ ์ฝ”๋“œ ๊ตฌํ˜„์„ ๋ณด๋ฉด์„œ ์ถ”๊ฐ€ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. MPC(Model Predictive Control)์˜ ์ปจ์…‰ ๊ธฐ๊ธฐ ์ƒํƒœ ๋ณ€ํ™”(dynamics) + ์ฃผ๋ณ€ ํ™˜๊ฒฝ ์š”์†Œ => cost function ์ œ์–ด๊ณตํ•™ ๋น„์„ ํ˜• / ๋น„๋ณผ๋ก(Non-linear, Non-convex) ๋Œ€์ƒ ๊ณต๋ถ€ํ•˜๋ฉด์„œ ๋Š๋ผ๊ธฐ์—๋Š” ๊ฐ•ํ™”ํ•™์Šต์˜ ํ–ฅ๊ธฐ๊ฐ€ ์ข€ ์žˆ์Œ Flow k-1 ์ผ ๋•Œ์˜ ์ƒํƒœ ๋ณ€์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ k+1 ~ ..
[Math] Mathematics for Machine Learning 2. Linear Algebra
ยท
๐Ÿฌ ML & Data/โ” Q & etc.
๊ทผ๋ž˜์— ์ •๋ง์ด์ง€ ์ˆ˜ํ•™ ๊ณต๋ถ€์˜ ํ•„์š”์„ฑ์„ ๋Š๊ปด์„œ MML ์ด๋ผ๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ์ˆ˜ํ•™์˜ ๋ฐ”์ด๋ธ” ๊ฐ™์€ ์ฑ…์œผ๋กœ ๊ณต๋ถ€๋ฅผ ์‹œ์ž‘ํ–ˆ๋Š”๋ฐ... ์ผ๋‹จ ์˜์–ด๊ณ (!), ์šฉ์–ด๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๊ณ (!), ๋‚ด์šฉ๋„ ์–ด๋ ค์›Œ์„œ ์•„์ฃผ ์• ๋ฅผ ๋จน๊ณ  ์žˆ๋‹ค. ์–ด์ฐŒ์ €์ฐŒ ์ดํ•ดํ–ˆ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ ์—ฐ์Šต๋ฌธ์ œ๋ฅผ ๋ณด๋‹ˆ๊นŒ ๋˜ ์ด์•ผ~ ๋ชจ๋ฅด๊ฒ ๊ณ  ๋‚œ๋ฆฌ๋‹ค... ๋‹ต์•ˆ์ง€๋ฅผ ๋ด๋„ ์ดํ•ด๊ฐ€ ์–ด๋ ค์šด ๋ถ€๋ถ„์ด ๋งŽ์•„์„œ ๊ผผ๊ผผํ•˜๊ฒŒ ๊ฐ€์ด๋“œ ๋”ฐ๋ผ ๋‘์„ธ๋ฒˆ ํ’€์–ด๋ด์•ผ ์ดํ•ด๊ฐ€ ๋˜์ง€ ์‹ถ๋‹ค. ๊ทผ๋ฐ ๋„ˆ๋ฌด ์–ด๋ ต๋‹ค ใ…Žใ…‹... ์„ ํ˜•๋Œ€์ˆ˜ ๊ฐ•์˜๋ฅผ ์ˆ˜๊ฐ•ํ–ˆ์—ˆ๋Š”๋ฐ๋„ ๋‚ด๊ฐ€ ๋“ค์—ˆ๋˜ ์„ ํ˜•๋Œ€์ˆ˜ ๊ฐ•์˜์˜ ๋ฒ”์œ„๋ณด๋‹ค ๋” ๋„“์€ ๋“ฏ ํ•˜๋‹ค. ์•„๋ฌดํŠผ ์•„๋ž˜ ๋งํฌ๋Š” ์ฐธ๊ณ ํ•œ ์‚ฌ์ดํŠธ ๋“ฑ. ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ ํ•ด์ฃผ์‹  ์ค€๋ณ„๋‹˜ ์ •๋ง ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค... ๋น„๊ตํ•˜๋ฉฐ ๋ณด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค... ๊ต์žฌ - pdf ๋ฌด๋ฃŒ ๊ณต๊ฐœ(https://mml-book.github.io/book/mml-boo..
[๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹] 1. ๋„“์€ ์‹œ๊ฐ์œผ๋กœ ๋ณด๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐœ๊ด„
ยท
๐Ÿฌ ML & Data/๐Ÿฆ„ ๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹
2022๋…„ 11์›” Chat GPT๊ฐ€ ๋Œ€์ค‘์ ์œผ๋กœ ๊ต‰์žฅํžˆ ๋„“๊ฒŒ ์•Œ๋ ค์ง€๋ฉด์„œ ์„œ์„œํžˆ ๋ถ์ด ์˜ค๊ณ  ์žˆ๋˜ ์ธ๊ณต์ง€๋Šฅ ์‹œ์žฅ์ด ๊ทธ์•ผ๋ง๋กœ ์ „์„ฑ๊ธฐ๋ฅผ ๋งž์ดํ–ˆ๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“œ๋Š” ์š”์ฆ˜์ž…๋‹ˆ๋‹ค. LLM(Large Language Model) ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ CV(Computer Vision) ๋ถ„์•ผ์—์„œ๋Š” ์ €์ž‘๊ถŒ ๋ฌธ์ œ๊ฐ€ ๋Œ€๋‘๋˜๊ณ  ์žˆ๊ธฐ๋Š” ํ•˜์ง€๋งŒ ์‚ฌ์ง„๊ณผ ๊ทธ๋ฆผ์ฒด๋ฅผ ํ•™์Šต์‹œ์ผœ ๊ทธ๋ฆผ์ฒด๋ฅผ ์ž…์€ ์ƒˆ๋กœ์šด ๊ทธ๋ฆผ์„ ๋งŒ๋“ค์–ด๋‚ด๊ณ , ์Œ์„ฑํ•ฉ์„ฑ ๋ถ„์•ผ์—์„œ๋Š” ์ธ๊ณต์ง€๋Šฅ์„ ํ™œ์šฉํ•ด TTS๊ฐ€ ๋…ธ๋ž˜๋ฅผ ๋ถ€๋ฅด๊ฒŒ ํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ๋ˆˆ์— ๋ณด์ด๋Š” ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ์œ„์™€ ๊ฐ™์€ ๋ถ„์•ผ๋ฅผ ์ œ์™ธํ•˜๊ณ ๋„ ์ธ๊ณต์ง€๋Šฅ์„ ํ†ตํ•œ ์ด์ƒํƒ์ง€ ์†”๋ฃจ์…˜, ๊ฐ•ํ™”ํ•™์Šต์„ ํ™œ์šฉํ•œ ๊ฒŒ์ž„ ๋ด‡(Bot) ์ƒ์„ฑ ๋“ฑ ์•„์ง ์ €๋„ ์™„๋ฒฝํžˆ ์“ฐ์ž„์„ ๋‹ค ์•Œ์ง€ ๋ชปํ•˜๋Š” ๋ฌด๊ถ๋ฌด์ง„ํ•œ ๋ถ„์•ผ์—์„œ ๋”ฅ๋Ÿฌ๋‹์ด ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” AI๋ฅผ ๊ณต๋ถ€ํ•˜๊ธฐ๋กœ ..
[๊ฐ•ํ™”ํ•™์Šต] Dealing with Sparse Reward Environments - ํฌ๋ฐ•ํ•œ ๋ณด์ƒ ํ™˜๊ฒฝ์—์„œ ํ•™์Šตํ•˜๊ธฐ
ยท
๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
โ€ป ์•„๋ž˜ ๋งํฌ์˜ ๋‚ด์šฉ์„ ๊ณต๋ถ€ํ•˜๋ฉฐ ํ•œ๊ตญ์–ด๋กœ ์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. Reinforcement Learning: Dealing with Sparse Reward Environments Reinforcement Learning (RL) is a method of machine learning in which an agent learns a strategy through interactions with its environment… medium.com 1. Sparse Reward Sparse Reward(ํฌ๋ฐ•ํ•œ ๋ณด์ƒ) : Agent๊ฐ€ ๋ชฉํ‘œ ์ƒํ™ฉ์— ๊ฐ€๊นŒ์›Œ์กŒ์„ ๋•Œ๋งŒ ๊ธ์ • ๋ณด์ƒ์„ ๋ฐ›๋Š” ๊ฒฝ์šฐ ํ˜„์žฌ ์‹คํ—˜ ํ™˜๊ฒฝ ์„ธํŒ…๊ณผ ๊ฐ™์Œ Curiosity-Driven method agent๊ฐ€ ๊ด€์‹ฌ์‚ฌ ๋ฐ–์˜ ํ™˜๊ฒฝ์—๋„ ๋™๊ธฐ๋ฅผ ๋ฐ›๋„๋ก Curric..
[๊ฐ•ํ™”ํ•™์Šต] DDPG(Deep Deterministic Policy Gradient)
ยท
๐Ÿฌ ML & Data/๐Ÿ“ฎ Reinforcement Learning
DQN์˜ ์ฐจ์›์˜ ์ €์ฃผ ๋ฌธ์ œ(๊ณ ์ฐจ์› action์„ ๋‹ค๋ฃจ๋Š” ๊ฒฝ์šฐ ์—ฐ์‚ฐ ์†๋„๊ฐ€ ๋Š๋ ค์ง€๊ณ  memory space๋ฅผ ๋งŽ์ด ์š”ํ•จ)๋ฅผ off-policy actor critic ๋ฐฉ์‹์œผ๋กœ ํ’€์–ด๋‚ธ๋‹ค. ๊ธฐ์กด DQN ๋ฐฉ์‹์˜ insight๋“ค์— batch normalization replay buffer target Q network Actor-critic ํŒŒ๋ผ๋ฏธํ„ฐํ™” ๋œ actor function์„ ๊ฐ€์ง actor function : state์—์„œ ํŠน์ • action์œผ๋กœ mappingํ•˜์—ฌ ํ˜„์žฌ policy๋ฅผ ์ง€์ • policy gradient ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต ์—ฌ๊ธฐ์—์„œ J๊ฐ€ Objective Function(๋ชฉํ‘œํ•จ์ˆ˜) actor function์ด ๋ชฉํ‘œ ํ•จ์ˆ˜๋ฅผ gradient asent๋กœ ์ตœ๋Œ€ํ™”→ ์ด ๋•Œ์˜ policy parameter..