[Model Compression] 모델 양자화(Model Optimization) with Tensorflow

728x90

딥러닝 모델을 경량화하는 것은 모델 학습 이후 실제 문제에 딥러닝 해법을 적용하는 과정에 있어서 실행 시간, 예측에 필요한 리소스 소모량을 줄이기 위해서 필요한 과정이다.
모델 경량화에는 (내가 알고 있기로는) 세 가지 방법이 있는데,

모델 양자화(비트 수를 줄이는 방식)
모델 pruning(중요하지 않은 부분을 버리는 방식)
그냥 모델 설계를 잘하기

중에 이미 학습한 모델에 있어서 가장 쉬운 양자화를 우선 하기로 결정하였다. Tensorflow 기반의 모델을 양자화하는 예제를 기록해둔다.

출처 : https://medium.com/@jan_marcel_kezmann/master-the-art-of-quantization-a-practical-guide-e74d7aad24f9

Tensorflow로 구성되어 학습하고 가중치를 .h5 확장자로 저장한 모델

1. 모델 양자화

a. 모델 불러오기

import tensorflow as tf

model = your_model(parameter)
model.load_weights('YOUR_MODEL_PATH')

b. 양자화

converter = tf.lite.TFLiteConverter.from_keras_model(model) # 모델이 케라스로 빌드되지 않았다면 다른 것 사용
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()

open("SAVE_PATH.tflite", "wb").write(tflite_quant_model) # write byte로 저장

만일 LSTM과 같은 레이어(기본 지원하지 않는 operator)를 사용하는 경우에는 아래 내용을 convert 전에 추가

converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
  tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]

2. 양자화 된 모델 로드

a. interpreter 정의

import tensorflow as tf
interpreter = tf.lite.Interpreter('YOUR_MODEL_PATH')
interpreter.allocate_tensors()
input_detail = interpreter.get_input_details()
output_detail = interpreter.get_output_details()

b. 테스트

interpreter.set_tensor(input_detail[0]['index'], data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_detail[0]['index'])

양자화 전 파일 크기

양자화 후 파일 크기

양자화 전후 성능 비교

양자화 전
- F1 Score = 0.9703125 / Accuracy = 97.031 %

양자화 후
- F1 Score = 0.9671875 | Accuracy = 96.719 %

참고

https://medium.com/@jan_marcel_kezmann/master-the-art-of-quantization-a-practical-guide-e74d7aad24f9

Master the Art of Quantization: A Practical Guide

Exploring and Implementing Quantization Methods with TensorFlow and PyTorch

medium.com

728x90

저작자표시 비영리 변경금지 (새창열림)

'🐬 ML & Data > ❔ Q & etc.' 카테고리의 다른 글

[On-Device AI] 라즈베리파이에서 Ollama로 llama3.2 동작시키기 (0)	2025.02.11
[Math] Mathematics for Machine Learning 2. Linear Algebra (0)	2024.01.10
[Data] 전동 모터 이상탐지 및 분류를 위한 주파수 분석 (0)	2023.09.26
[PyTorch] pretrained model load/save, pretrained model 편집 (0)	2022.09.19

1. 모델 양자화

a. 모델 불러오기

b. 양자화

2. 양자화 된 모델 로드

a. interpreter 정의

b. 테스트

양자화 전후 성능 비교

'🐬 ML & Data > ❔ Q & etc.' 카테고리의 다른 글

티스토리툴바