[Paper Review] Transformer - Attention is All You Need
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
๋ถ„๋ช… ๋‚ด๊ฐ€ ๋”ฅ๋Ÿฌ๋‹์— ์ž…๋ฌธํ–ˆ์„ ๋•Œ๋„ Transformer๊ฐ€ ๊ฐ๊ด‘๋ฐ›๊ณ  ์žˆ์—ˆ๋Š”๋ฐ ์•„์ง๊นŒ์ง€๋„ Transformer๊ฐ€ ๊ฐ๊ด‘๋ฐ›๊ณ  ์žˆ๋‹ค. ์„ธ์ƒ์€ ๋ณ€ํ•˜๋Š”๋ฐ... ์ฐธ ์‘์šฉ๋„๋„ ๋†’๊ณ  ์„ฑ๋Šฅ์ด ์ข‹์€ ๋ชจ๋ธ์ž„์— ํ‹€๋ฆผ์—†๋‹ค.์˜ˆ์ „์— ๊ณต๋ถ€ํ•ด๋ณธ๋ฐ” ์žˆ์ง€๋งŒ ๊ฒฝํ—˜์น˜๊ฐ€ ์Œ“์ธ ์ง€๊ธˆ ์ข€ ๋” ์ง€์‹์„ ๊ณต๊ณ ํžˆํ•  ๊ฒธ ์ •๋ฆฌํ•ด๋ดค๋‹ค. ๊ธ€๋กœ ์˜ฎ๊ธฐ๊ธฐ ๊ท€์ฐฎ์•„์„œ ์†๊ธ€์”จ ์‚ฌ์ง„์œผ๋กœ ๋Œ€์ฒดํ•œ๋‹ค. 1. Attention ๊ฐœ๋… ์ดํ•ดํ•˜๊ธฐ2. Self-Attention์ด๋ž€?3. Transformer ๊ตฌ์กฐ4. Transformer์˜ Query, Key, Value ๊ตฌํ•ด๋ณด๊ธฐ5. Multi-head Attention6. Encoder์™€ Decoder์—์„œ Self-Attention์˜ ๋™์ž‘7. Masked Self Attention8. ๊ธฐํƒ€ ๊ฐœ๋… ๋ฐ ๊ธฐ๋ฒ•8.1. Feed Forward8.2..
[Paper Review] Mamba - Linear Time Sequence Modeling with Selective State Spaces 2
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
3. Selective State Space Models3.1 Selection as a Means of Compression๋ณ‘ํ•ฉ ์ž‘์—…์— ๊ด€ํ•œ ๋‘๊ฐ€์ง€ ์‹คํ–‰ ์˜ˆ์‹œSelective Copying : ๊ธฐ์–ตํ•  ํ† ํฐ์˜ ์œ„์น˜๋ฅผ ๋ฐ”๊ฟ” Copying Task๋ฅผ ์ˆ˜์ •ํ•œ๋‹ค. ๊ด€๋ จ์žˆ๋Š” ํ† ํฐ์„ ๊ธฐ์–ตํ•˜๊ณ  ๊ด€๋ จ์—†๋Š” ํ† ํฐ์„ ๊ฑธ๋Ÿฌ๋‚ด๋ ค๋ฉด ๋‚ด์šฉ ์ธ์‹ ์ถ”๋ก (content-aware resoning)์ด ํ•„์š”ํ•˜๋‹ค.Induction Heads : ์ ์ ˆํ•œ ์ปจํ…์ŠคํŠธ์—์„œ ์ถœ๋ ฅ์„ ๋‚ผ ์‹œ๊ธฐ๋ฅผ ์•Œ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‚ด์šฉ ์ธ์‹ ์ถ”๋ก ์ด ํ•„์š”ํ•˜๋‹ค. LLM์˜ ๋™์ž‘ ๊ณผ์ • ์„ค๋ช…์„ ์œ„ํ•ด ๊ฐ€์žฅ ๋งŽ์ด ์“ฐ์ด๋Š” ๋งค์ปค๋‹ˆ์ฆ˜.์ด ์ž‘์—…์€ LTI ๋ชจ๋ธ์˜ ์‹คํŒจํ•œ ๋ชจ๋“œ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ํšŒ๊ท€์  ๊ด€์ ์—์„œ constant dynamics(์—ฌ๊ธฐ์—์„œ๋Š” $\bar{A}, \bar{B}$)๋Š” context์—..
[Paper Review] Mamba - Linear Time Sequence Modeling with Selective State Spaces 1
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
๋‚˜์˜จ์ง€ ๋ฒŒ์จ 1๋…„๋„ ๋„˜์—ˆ์ง€๋งŒ ์ตœ์‹  ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ์•ˆ ํ•œ์ง€๊ฐ€ ๋ฐฑ๋งŒ๋…„ ์ •๋„ ๋œ ๊ฒƒ ๊ฐ™์•„์„œ ํ•œ ๋ฒˆ ์ฝ์–ด๋ณด๋Š” mamba... ๊ทธ๋ฆฌ ์ •ํ™•ํ•œ ๋ฆฌ๋ทฐ๋Š” ์•„๋‹ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์‹ค ๋ฒˆ์—ญ์— ๊ฐ€๊น๊ณ  ์ข€ ๋” ์ดํ•ดํ•ด๋ณด๋ฉด์„œ ๋‚ด์šฉ ์ˆ˜์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. 1. Introduction์ตœ๊ทผ๋“ค์–ด Structured State Space Sequence Models(SSMs) ๊ฐ€ ์‹œํ€€์Šค ๋ชจ๋ธ๋ง ๊ตฌ์กฐ ๋ถ„์•ผ์—์„œ ์œ ๋งํ•œ ํด๋ž˜์Šค๋กœ ๋“ฑ์žฅํ–ˆ๋‹ค. ์ „ํ†ต์ ์ธ state space models์— ์˜๊ฐ์„ ๋ฐ›์•„ CNN๊ณผ RNN์˜ ํ†ตํ•ฉ์„ ์กฐ์œจํ•œ๋‹ค(interpreted).Mamba์—์„œ๋Š” selective state space model์˜ ์ƒˆ๋กœ์šด ์ข…๋ฅ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์‹œํ€„์Šค ๊ธธ์ด์— ๋”ฐ๋ผ ์„ ํ˜•์ ์œผ๋กœ ํ™•์žฅํ•˜๋ฉด์„œ transformer์˜ ๋ชจ๋ธ๋ง ํŒŒ์›Œ๋ฅผ ๋”ฐ๋ผ์žก๊ธฐ ์œ„ํ•ด์„œ ๋ช‡๋ช‡์˜ axes(์—ฌ..
[Paper Review] Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
* ๊ฐœ์ธ์ ์œผ๋กœ ์ฝ๊ณ  ๊ฐ€๋ณ๊ฒŒ ์ •๋ฆฌํ•ด๋ณด๋Š” ์šฉ๋„๋กœ ์ž‘์„ฑํ•œ ๊ธ€์ด๋ผ ๋ฏธ์ˆ™ํ•˜๊ณ  ์ •ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์–‘ํ•ด ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค :D Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning Cooling system plays a critical role in a modern data center (DC). Developing an optimal control policy for DC cooling system is a challenging task. The prevailing approaches often rely on approximating system models that are built upon the knowled..
[Model Review] TadGAN(Time series Anomaly Detection GAN)
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
์ด๋ฒˆ์— ๊ณ ์žฅ์ง„๋‹จ์— ๊ด€ํ•œ ๊ณผ์ œ๋ฅผ ํ•˜๊ฒŒ ๋˜๋ฉด์„œ LSTM AE๋‚˜ CNN ๋ณด๋‹ค ์ตœ๊ทผ ๋ชจ๋ธ์„ ์ ์šฉํ•ด๋ณด๊ณ  ์‹ถ์–ด์„œ TadGAN์„ ๊ณจ๋ž๋‹ค. ์•„์ง ์™„์ „ํžˆ ์ดํ•ดํ–ˆ๋Š”์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์œผ๋‚˜ ์•Œ๊ฒŒ๋œ๋Œ€๋กœ ์กฐ๊ธˆ ์ ์–ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค. TadGAN(Time series Anomaly Detection GAN) TadGAN์€ 2020๋…„ ๋ฐœํ‘œ๋œ ๋ชจ๋ธ๋กœ, ์ด๋ฆ„ ๊ทธ๋Œ€๋กœ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ์ด์ƒ ํƒ์ง€์šฉ GAN ๋ชจ๋ธ์ด๋‹ค. GAN ๋ชจ๋ธ์€ ๋ณต์›, ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋“ฑ์— ํŠนํ™”๋˜์–ด ์žˆ๋Š”๋ฐ, ์ด ์„ฑ์งˆ์„ ์ด์šฉํ•˜์—ฌ LSTM Auto Encoder์ฒ˜๋Ÿผ ํŒจํ„ด์„ ๋ณต์›ํ•˜๋ฉฐ ํ•™์Šตํ•˜๊ณ , ์ดํ›„์— ๋“ค์–ด์˜ค๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜ˆ์ธกํ–ˆ์„ ๋•Œ ์—๋Ÿฌ๊ฐ€ ํฐ ๋ถ€๋ถ„์„ ์ด์ƒ์น˜๋กœ ํƒ์ง€ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. TadGAN์˜ ๊ตฌ์กฐ TadGAN์€ 2๊ฐœ์˜ Generator์™€ 2๊ฐœ์˜ Critic ์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. Gene..
[Model Review] YOLOv5 + Roboflow Annotation
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
! ์ฃผ์˜ ! ์ด ๊ธ€์—๋Š” ์ ์€ yolo v5์— ๋Œ€ํ•œ ์š”์•ฝ๊ณผ ์งง์€ ์‚ฌ์šฉ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  roboflow annotation์— ๋Œ€ํ•œ ๊ฐœ์ธ์ ์ธ ๊ฒฌํ•ด๊ฐ€ ์“ฐ์—ฌ์žˆ์Šต๋‹ˆ๋‹ค. 1. YOLOv5 Summary You Only Look Once - one stage detection ๋ชจ๋ธ R-CNN์ด๋‚˜ Faster R-CNN๊ณผ ๋‹ฌ๋ฆฌ ์ด๋ฏธ์ง€ ๋ถ„ํ•  ์—†์ด ์ด๋ฏธ์ง€๋ฅผ ํ•œ ๋ฒˆ๋งŒ ๋ณด๋Š” ํŠน์ง• ์ „์ฒ˜๋ฆฌ๋ชจ๋ธ๊ณผ ์ธ๊ณต์‹ ๊ฒฝ๋ง ํ†ตํ•ฉ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒดํƒ์ง€ Backbone : input image → feature map CSP-Darknet https://keyog.tistory.com/30 Head : predict classes / bounding boxes Dense Prediction : One stage detector(predict classes + b..
[Model Review] MobileNet SSD ๋…ผ๋ฌธ ํ€ต ๋ฆฌ๋ทฐ
ยท
๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ
ํ€„๋ฆฌํ‹ฐ๊ฐ€ ๋†’์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ฃผ์˜! Mobile Object Detection model - based on VGG- 16 https://arxiv.org/abs/1704.04861 1. Summary VGG-16 ๊ธฐ๋ฐ˜ ๊ธฐ๋ณธ ๋ชจ๋ธ์ด๋‹ค. ๊ธฐ์กด VGG-16 ๋ชจ๋ธ์ด 3x3x3 convolution์„ 3-dimention์œผ๋กœ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด parameter ๊ฐœ์ˆ˜๊ฐ€ 81๊ฐœ์˜€๋Š”๋ฐ, mobile ๊ธฐ๊ธฐ ์œ„์— ์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด depthwise convolution๊ณผ pointwise convolution์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ 331 x 3 + 311 x 3 = 27 + 9 = 36๊ฐœ์˜ parameter๋กœ ์ค„์ธ ๋ฐฉ์‹์˜ ๋ชจ๋ธ์ด๋‹ค. → ์ด๋ฅผ Depth separable convolution ์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. 2. Architectur..