๐Ÿฌ ML & Data/๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ

[Paper Review] Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

darly213 2023. 8. 7. 15:18
728x90

* ๊ฐœ์ธ์ ์œผ๋กœ ์ฝ๊ณ  ๊ฐ€๋ณ๊ฒŒ ์ •๋ฆฌํ•ด๋ณด๋Š” ์šฉ๋„๋กœ ์ž‘์„ฑํ•œ ๊ธ€์ด๋ผ ๋ฏธ์ˆ™ํ•˜๊ณ  ์ •ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์–‘ํ•ด ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค :D

 

Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

Cooling system plays a critical role in a modern data center (DC). Developing an optimal control policy for DC cooling system is a challenging task. The prevailing approaches often rely on approximating system models that are built upon the knowledge of me

arxiv.org

  • EnergyPlus๋ฅผ ํ†ตํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
  • ์„œ๋ฒ„ ์‚ฌ์ด์— cold aisle ๋„ฃ์–ด๋‘๊ณ  ์ด๊ฑธ๋กœ ์ „์ฒด ์„œ๋ฒ„ cooling์„ ์ปจํŠธ๋กค

 

Simulation System model

Data center model

  • ์„œ๋กœ ๋‹ค๋ฅธ ํฌ๊ธฐ, ์œ„์น˜์™€ ๋…๋ฆฝ์  cooling system์„ ๊ฐ€์ง„ data center(DX -์ง์ ‘ํ™•์žฅ / Chiller)
  • IT Equipment + illumination๊ณผ ๊ฐ™์€ ์†Œ์Šค์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ฐœ์—ด
  • ITE ๋ถ€ํ•˜๋Š” ์ œ๊ณฑ๋ฏธํ„ฐ ๋‹น ์ •ํ•ด์ง„ ๋ถ€ํ•˜ L(์ „๋“ฑ ๋“ฑ)๊ณผ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋Š” ๋ถ€ํ•˜๋Ÿ‰ a์˜ ๊ณฑ
  • zone 1 load density 4kw, zone 2 2kw
  • ์ž‘์—… ๋ถ€ํ•˜์™€ ์˜จ๋„๋ฅผ ํ•˜๋‚˜์˜ ํŠœํ”Œ๋กœ ์ž‘์„ฑํ•ด์„œ state์— ์‚ฌ์šฉ
  • reward๋กœ PUE์™€ IT Equipment outlet ์˜จ๋„ ์ œ๊ณต
    • PUE๋Š” ์ตœ์†Œํ™”, ITE๋Š” ์ผ์ • ์ˆ˜์ค€ ์ด๋‚ด

Cooling system model

  • Action space
    • ๊ฐ™์€ ์ˆ˜๋ƒ‰ ๊ธฐ๋ฐ˜ / ๋ฌผ์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์ด ๋‹ค๋ฆ„

 

Problem statement

  • ์˜จ๋„ `\T_{amb}\` ์™€ ๋ถ€ํ•˜ `\H_{ite}`\ , ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณ€ํ™”ํ•˜๋Š” tuple ์ œ๊ณต
  • ๋ƒ‰๊ฐ์ˆ˜์˜ 5๊ฐ€์ง€ input์„ ์ œ์–ดํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ(์œ„ ๊ทธ๋ฆผ์˜ Txx - DEC outlet temp, IEC outlet temp, chilled water loop outlet temp, DX cooling coil outlet temp, chiller cooling coil outlet temp)
  • PUE์˜ ์ตœ์†Œํ™”์™€ ์„œ๋ฒ„ ๊ณผ์—ด์˜ ํŒจ๋„ํ‹ฐ
    • ๋‘ ๊ฐœ์˜ ๋ชฉ์  ํ•จ์ˆ˜
      • penalty function(์ตœ์†Œํ™”)
        • λ - penalty ๊ณ„์ˆ˜
        • Tzi - zone i ์— ๋Œ€ํ•œ ํ‰๊ท  ITE ์˜จ๋„
        • φ - ๊ณผ์—ด ๊ธฐ์ค€ threshold

 

Neural end to end cooling control algorithm(CCA)

Batch Learning(Offline learning) / On Policy

  • ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์— ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ์œ„ํ—˜์„ ๊ฐ์ˆ˜ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด ๊ฒฝ์šฐ offline learning(batch learning) ์‚ฌ์šฉ
  • batch ํ•™์Šต์—๋„ ๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜๊ฐ€ ์žˆ๋Š”๋ฐ on Policy์™€ off policy
    • simulation ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ๋น„์šฉ์ด ๋†’์•„์„œ off policy ์‚ฌ์šฉ
  • Off-policy algorithms generally employ a separate behavior policy, which is independent of the policy being estimated, to generate the training trace; while on-policy directly uses control policy being estimated (in the real control practice or more likely in a simulator) to generate training data traces

 

CCA with offline trace

  • ์ผ๋ฐ˜์ ์ธ ๊ฐ•ํ™”ํ•™์Šต ์ ‘๊ทผ์—์„œ ๋ฏธ๋ž˜์˜ ๋ณด์ƒ ๋ฐ์ดํ„ฐ๋„ ํ‰๊ฐ€์— ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ๊ณผ๋Š” ๋‹ฌ๋ฆฌ
  • ์—ฌ๊ธฐ์—์„œ๋Š” ๋ฏธ๋ž˜ ๋ณด์ƒ ๋ฐ์ดํ„ฐ๋Š” ์•ˆ ์“ฐ๊ณ  ์ž‘์—…๋ถ€ํ•˜์™€ ๋‚ ์”จ ๋ฐ์ดํ„ฐ๊ฐ€ ์‹œ์Šคํ…œ ์ „ํ™˜์„ ์ •์˜
  • ์–ด๋–ค ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”์„ ๋•Œ ์ ์šฉ๋˜๋Š”๋ฐ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๋ฏ€๋กœ ์ด๋ฒˆ ์‹œ๊ฐ„์— ๊ด€์ฐฐํ•œ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ค์Œ ์‹œ๊ฐ„์— ๋ฐ˜์˜๋˜๋„๋ก ์‹œ๊ฐ„ ์ถ•์†Œ(?)
  • ๋ฐ์ดํ„ฐ๋Š” ์ „๋ถ€ N ์‹œ๊ฐ„ ๋™์•ˆ์˜ ์‹œ๊ณ„์—ด

  • Q-Network
    • ํ˜„์žฌ ์ƒํƒœ s ์—์„œ ํ–‰๋™ a๋ฅผ ์ทจํ–ˆ์„ ๋•Œ์˜ ๋น„์šฉ ์ถœ๋ ฅ
    • ์žฌ๊ท€์ ์ธ ์˜์‚ฌ๊ฒฐ์ • ์‹œ๋„ → ์ด์ „์˜ ์ƒํƒœ์™€ ๋™์ž‘๋“ค๋„ ๊ณ ๋ คํ•จ(?)
    • MSE
  • Policy Network
    • ํ˜„์žฌ ์ƒํƒœ s์—์„œ ํ–‰๋™ a๋ฅผ ์ทจํ–ˆ์„ ๋•Œ Q๋ฅผ ์ถœ๋ ฅ
    • ์ดˆ๋ฐ˜์— validation error๊ฐ€ ์ž‘์€ ๊ฒƒ์€ ์˜ค๋ฅ˜๊ฐ€ ์•„๋‹˜. ํ•™์Šต ๋œ ๋ผ์„œ ๊ทธ๋ ‡๋‹ค.

Neural Network Design

  • Q-Network
    • activation function์œผ๋กœ tanh ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋‘ ๊ฐœ์˜ hidden layer
    • linear output layer
    • ์Œ์ˆ˜ reward ์ถœ๋ ฅ
    • ์‹ค์ œ y ๋ฐ์ดํ„ฐ์™€ ์˜ˆ์ธก๋œ yr ๋ฐ์ดํ„ฐ๊ฐ„์˜ ๊ฐ„๊ทน์„ ์ค„์ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•จ
  • Policy Network
    • linear activation function๊ณผ tanh activation function์„ ์‚ฌ์šฉํ•˜๋Š” ๋‘ ๊ฐœ์˜ hidden layer
    • ๋‹ค์Œ control action์ธ a๋ฅผ ์ถœ๋ ฅ
    • Q-Network์˜ loss function ์ตœ์ ํ™”
  • ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ (-1, 1) ๋ฒ”์œ„๋กœ ์ •๊ทœํ™”ํ•ด์„œ tanh activation function์— ๋งž์ถ”๊ณ , ์‹ค์ œ ์—๋„ˆ์ง€์™€ ์˜จ๋„ ๊ฐ’์„ ๊ณ„์‚ฐํ•ด์•ผํ•  ๋•Œ ๋น„์ •๊ทœํ™”ํ•จ.

 

  1. Data
    • state data series, action, reward ๋ฐ์ดํ„ฐ ํ•„์š”
    • Q-NN input
    • policy network input
    • loss data y ๊ณ„์‚ฐ์„ ์œ„ํ•œ PUE์™€ ์˜จ๋„ ๋ฐ์ดํ„ฐ
  2. initialize
    • Q network ์™€ policy network ์ƒ์„ฑ
    • weight parameter random initialize
  3. epoch / mini batch์— ๋”ฐ๋ผ์„œ
    • Q NN ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™”
    • policy network ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™”
    • swap / evaluation
  4. return
    • ์ตœ์  ๊ฐ€์ค‘์น˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์„ธํŒ…๋œ Q network์™€ policy network
728x90