[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 17. ํ•™์Šต๊ณผ ๊ฒ€์ฆ ๊ณก์„ , ๊ทธ๋ฆฌ๊ณ  ๊ทธ๋ฆฌ๋“œ ์„œ์น˜

2020. 2. 28. 02:14ยท๐Ÿฌ ML & Data/๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹
๋ชฉ์ฐจ
  1. A. ํ•™์Šต ๊ณก์„ ๊ณผ ๊ฒ€์ฆ ๊ณก์„ ์„ ์‚ฌ์šฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋””๋ฒ„๊น…
  2. 1. ํ•™์Šต ๊ณก์„ ์œผ๋กœ ํŽธํ–ฅ๊ณผ ๋ถ„์‚ฐ ๋ฌธ์ œ ๋ถ„์„
  3. 2. ๊ฒ€์ฆ ๊ณก์„ ์œผ๋กœ ๊ณผ๋Œ€์ ํ•ฉ๊ณผ ๊ณผ์†Œ์ ํ•ฉ ์กฐ์‚ฌ
  4. B. ๊ทธ๋ฆฌ๋“œ ์„œ์น˜๋ฅผ ์‚ฌ์šฉํ•œ ๋จธ์‹  ๋Ÿฌ๋‹ ๋ชจ๋ธ ์„ธ๋ถ€ ํŠœ๋‹
  5. 1. ๊ทธ๋ฆฌ๋“œ ์„œ์น˜๋ฅผ ์‚ฌ์šฉํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹
  6. 2. ์ค‘์ฒฉ ๊ต์ฐจ ๊ฒ€์ฆ์„ ์‚ฌ์šฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ ํƒ
728x90

์ด๋ฒˆ ์„ธ์…˜์—์„œ๋Š” ๋‘ ๊ฐ€์ง€ ๊ณก์„ ์„ ์ด์šฉํ•ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋””๋ฒ„๊น…ํ•˜์—ฌ ๋ณด๋‹ค ๋‚˜์€ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ• ํ•˜๋‚˜์™€, ๊ทธ๋ฆฌ๋“œ ์„œ์น˜๋ฅผ ์ด์šฉํ•ด ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํŠœ๋‹ํ•˜๋Š” ๋ฐฉ๋ฒ• ๋‘ ๊ฐ€์ง€๋ฅผ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ง€๋‚œ ์„ธ์…˜๊ณผ ์–ด๋А์ •๋„ ์ด์–ด์ง€๋Š” ์ฃผ์ œ์ด๋‹ˆ, 16์„ ๋ณด๊ณ  ์˜ค์…”๋„ ์ข‹์Šต๋‹ˆ๋‹ค!

A. ํ•™์Šต ๊ณก์„ ๊ณผ ๊ฒ€์ฆ ๊ณก์„ ์„ ์‚ฌ์šฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋””๋ฒ„๊น…

1. ํ•™์Šต ๊ณก์„ ์œผ๋กœ ํŽธํ–ฅ๊ณผ ๋ถ„์‚ฐ ๋ฌธ์ œ ๋ถ„์„

ํ•™์Šต ๊ณก์„ , ์ฆ‰ ๊ทธ๋ž˜ํ”„๋ฅผ ์ด์šฉํ•˜๋ฉด ๋ชจ๋ธ์˜ ํ›ˆ๋ จ ์ •ํ™•๋„์™€ ๊ฒ€์ฆ ์ •ํ™•๋„๋ฅผ ๋ชจ๋ธ์˜ ๋†’์€ ๋ถ„์‚ฐ์— ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š”์ง€, ํŽธํ–ฅ์— ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š”์ง€๋ฅผ ์‰ฝ๊ฒŒ ํ™•์ธํ•˜๊ณ  ๊ณ ์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๋ชจ์œผ๋Š” ๊ฒƒ์€ ์‚ฌ์‹ค ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ๊ฐ€ ๊ฝค ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๊ผญ ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๋ชจ์•„์•ผํ•˜๋Š”์ง€๋ฅผ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ์ฃ .

์™ผ์ชฝ ์œ„ ๊ทธ๋ž˜ํ”„๋Š” ํŽธํ–ฅ์ด ๋†’์€ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ํ›ˆ๋ จ๊ณผ ๊ต์ฐจ๊ฒ€์ฆ์˜ ์ •ํ™•๋„๊ฐ€ ๋‹ค ๋‚ฎ์œผ๋ฏ€๋กœ, ์–ธ๋”ํ”ผํŒ… ๋˜์—ˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐœ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ํŠน์„ฑ์„ ๋” ์ˆ˜์ง‘ํ•˜๊ฑฐ๋‚˜ SVM, ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ถ„๋ฅ˜๊ธฐ์˜ ๊ทœ์ œ ๊ฐ•๋„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์ฃ .

์˜ค๋ฅธ์ชฝ ์œ„ ๊ทธ๋ž˜ํ”„๋Š” ๋ถ„์‚ฐ์ด ๋†’์€ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์ •ํ™•๋„์™€ ๊ต์ฐจ ๊ฒ€์ฆ ์ •ํ™•๋„ ์‚ฌ์ด์— ์ฐจ์ด๊ฐ€ ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์ฃ ? ์ด ๊ฒฝ์šฐ๋Š” ๊ณผ๋Œ€์ ํ•ฉ, ์˜ค๋ฒ„ํ”ผํŒ…์ž…๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๋” ๋งŽ์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๊ฑฐ๋‚˜ ๊ทœ์ œ๋ฅผ ๊ฐ•ํ™”ํ•จ์œผ๋กœ์จ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทœ์ œ๊ฐ€ ์—†๋Š” ๋ชจ๋ธ์—์„œ๋Š” ํŠน์„ฑ ์„ ํƒ์ด๋‚˜ ์ถ”์ถœ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด์ œ ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ ํ•™์Šต ๊ณก์„  ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด๋ณผ๊นŒ์š”?

import matplotlib.pyplot as plt

from sklearn.model_selection import learning_curve


pipe_lr = make_pipeline(StandardScaler(),
                        LogisticRegression(solver='liblinear', 
                                           penalty='l2', 
                                           random_state=1))

train_sizes, train_scores, test_scores =\
                learning_curve(estimator=pipe_lr,
                               X=X_train,
                               y=y_train,
                               train_sizes=np.linspace(0.1, 1.0, 10),
                               cv=10,
                               n_jobs=1)

train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)

plt.plot(train_sizes, train_mean,
         color='blue', marker='o',
         markersize=5, label='training accuracy')

plt.fill_between(train_sizes,
                 train_mean + train_std,
                 train_mean - train_std,
                 alpha=0.15, color='blue')

plt.plot(train_sizes, test_mean,
         color='green', linestyle='--',
         marker='s', markersize=5,
         label='validation accuracy')

plt.fill_between(train_sizes,
                 test_mean + test_std,
                 test_mean - test_std,
                 alpha=0.15, color='green')

plt.grid()
plt.xlabel('Number of training samples')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.ylim([0.8, 1.03])
plt.tight_layout()
plt.show()

learning_curveํ•จ์ˆ˜์˜ train_sizes ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ด์šฉํ•˜๋ฉด ํ•™์Šต๊ณก์„  ์ƒ์„ฑ์— ํ•„์š”ํ•œ ํ›ˆ๋ จ ์ƒ˜ํ”Œ์˜ ๊ฐœ์ˆ˜์™€ ๋น„์œจ์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ์ผ์ • ๊ฐ„๊ฒฉ์œผ๋กœ ํ›ˆ๋ จ ์„ธํŠธ์˜ ๋น„์œจ 10๊ฐœ๋ฅผ ์„ค์ •ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

learning_curveํ•จ์ˆ˜๋Š” ๊ณ„์ธต๋ณ„ k-๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ์„ ํ†ตํ•ด ๊ต์ฐจ ๊ฒ€์ฆ ์ •ํ™•๋„๋ฅผ ๊ณ„์‚ฐํ•ด์ค๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ k=10์ž…๋‹ˆ๋‹ค.

์ด ๊ทธ๋ž˜ํ”„์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ 250๊ฐœ ์ด์ƒ์˜ ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•  ๋•Œ ๋ชจ๋ธ์ด ์ž˜ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ๋ณด๋‹ค ์ ์œผ๋ฉด ๋” ๋งŽ์ด ์˜ค๋ฒ„ํ”ผํŒ… ๋œ๋‹ค๋Š” ์ฆ๊ฑฐ์ด์ฃ .

2. ๊ฒ€์ฆ ๊ณก์„ ์œผ๋กœ ๊ณผ๋Œ€์ ํ•ฉ๊ณผ ๊ณผ์†Œ์ ํ•ฉ ์กฐ์‚ฌ

๊ฒ€์ฆ ๊ณก์„ ์€ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋Š” ๋„๊ตฌ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์„ ํ•จ์ˆ˜๋กœ ๊ทธ๋ ค์ฃผ์ฃ . ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ ๊ฒฝ์šฐ์—๋Š” ๊ทœ์ œ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ C๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์‚ฌ์ดํ‚ท๋Ÿฐ์œผ๋กœ ๊ทธ๋ฆฌ๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณผ๊นŒ์š”?



param_range = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]
train_scores, test_scores = validation_curve(
                estimator=pipe_lr, 
                X=X_train, 
                y=y_train, 
                param_name='logisticregression__C', 
                param_range=param_range,
                cv=10)

train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)

plt.plot(param_range, train_mean, 
         color='blue', marker='o', 
         markersize=5, label='training accuracy')

plt.fill_between(param_range, train_mean + train_std,
                 train_mean - train_std, alpha=0.15,
                 color='blue')

plt.plot(param_range, test_mean, 
         color='green', linestyle='--', 
         marker='s', markersize=5, 
         label='validation accuracy')
plt.fill_between(param_range, 
                 test_mean + test_std,
                 test_mean - test_std, 
                 alpha=0.15, color='green')

plt.grid()
plt.xscale('log')
plt.legend(loc='lower right')
plt.xlabel('Parameter C')
plt.ylabel('Accuracy')
plt.ylim([0.8, 1.00])
plt.tight_layout()
plt.show()

learning_curve ํ•จ์ˆ˜์™€ ๋น„์Šทํ•˜๊ฒŒ validation_curve ํ•จ์ˆ˜๋Š” ๊ธฐ๋ณธ ์„ธํŒ…์ด ๊ณ„์ธต๋ณ„ k-๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ถ”์ •ํ•˜๋„๋ก ๋˜์–ด์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ•จ์ˆ˜์—์„œ ํ‰๊ฐ€ํ•˜๊ธธ ์›ํ•˜๋Š” ๋ณ€์ˆ˜๋ฅผ ์ง€์ •์„ ํ•ด์ค๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ C์ด์ฃ .

param_range์—์„œ๋Š” ์ด๋ฆ„์ฒ˜๋Ÿผ ๊ฐ’ ๋ฒ”์œ„๋ฅผ ์ง€์ •ํ•ด์ค๋‹ˆ๋‹ค. ํ•™์Šต๊ณก์„ ์—์„œ๋Š” ์ด์ „ ์ฒ˜๋Ÿผ ํ›ˆ๋ จ ์ •ํ™•๋„์™€ ๊ต์ฐจ ๊ฒ€์ฆ ์ •ํ™•๋„, ํ‘œ์ค€ ํŽธ์ฐจ๋ฅผ ๋‚˜ํƒ€๋‚ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

C์— ๋”ฐ๋ผ ์ •ํ™•๋„ ์ฐจ์ด๊ฐ€ ๋ฏธ๋ฌ˜ํ•˜์ง€๋งŒ ๊ทœ์ œ ๊ฐ•๋„๋ฅผ ๋†’์—ฌ C๋ฅผ ์ค„์ด๋ฉด ์กฐ๊ธˆ ๊ณผ์†Œ์ ํ•ฉ๋˜๊ณ , ๊ทœ์ œ ๊ฐ•๋„๋ฅผ ๋‚ฎ์ถฐ C๋ฅผ ๋Š˜๋ฆฌ๋ฉด ์กฐ๊ธˆ ๊ณผ๋Œ€์ ํ•ฉ๋จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

B. ๊ทธ๋ฆฌ๋“œ ์„œ์น˜๋ฅผ ์‚ฌ์šฉํ•œ ๋จธ์‹  ๋Ÿฌ๋‹ ๋ชจ๋ธ ์„ธ๋ถ€ ํŠœ๋‹

1. ๊ทธ๋ฆฌ๋“œ ์„œ์น˜๋ฅผ ์‚ฌ์šฉํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹

๊ทธ๋ฆฌ๋“œ ์„œ์น˜๋Š” ์ธ๊ธฐ์žˆ๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๋“œ ์„œ์น˜๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์„ ์ „๋ถ€ ์กฐ์‚ฌํ•˜๊ณ , ์ด ๊ฐ’์˜ ๋ชจ๋“  ์กฐํ•ฉ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•ด์„œ ์ตœ์ ์˜ ์กฐํ•ฉ์„ ์ฐพ์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ๋กœ ๋ณด์‹ค๊นŒ์š”?

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

pipe_svc = make_pipeline(StandardScaler(),
                         SVC(random_state=1))

param_range = [0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]

param_grid = [{'svc__C': param_range, 
               'svc__kernel': ['linear']},
              {'svc__C': param_range, 
               'svc__gamma': param_range, 
               'svc__kernel': ['rbf']}]

gs = GridSearchCV(estimator=pipe_svc, 
                  param_grid=param_grid, 
                  scoring='accuracy', 
                  cv=10,
                  n_jobs=-1)
gs = gs.fit(X_train, y_train)
print(gs.best_score_)
print(gs.best_params_)

์ด ์ฝ”๋“œ์—์„œ๋Š” GridSearchCV ํด๋ž˜์Šค์˜ ๊ฐ์ฒด๋ฅผ ๋งŒ๋“ค๊ณ  SVM์„ ์œ„ํ•ด ํŒŒ์ดํ”„๋ผ์ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ํŠœ๋‹ํ•ด์ค๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  param_grid์— ํŠœ๋‹ํ•˜๋ ค๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ง€์ •ํ•ด์ค๋‹ˆ๋‹ค. RBF ์ปค๋„ SVM์—์„œ๋Š” svc__C์™€ svc__gamma ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํŠœ๋‹ํ•ด์ค๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๋“œ ์„œ์น˜๋ฅผ ์ˆ˜ํ–‰ํ•œ ํ›„ ์ตœ๊ณ ์ ์ˆ˜๋Š” best_score_ ์†์„ฑ์—์„œ ์–ป์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด ๋ชจ๋ธ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” best_params_ ์†์„ฑ์—์„œ ํ™•์ธ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•ด ์„ฑ๋Šฅ์„ ์ถ”์ •ํ•ด์ค๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ best_estimator_์—์„œ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


clf = gs.best_estimator_
clf.fit(X_train, y_train)
print('ํ…Œ์ŠคํŠธ ์ •ํ™•๋„: %.3f' % clf.score(X_test, y_test))

2. ์ค‘์ฒฉ ๊ต์ฐจ ๊ฒ€์ฆ์„ ์‚ฌ์šฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ ํƒ

๊ทธ๋ฆฌ๋“œ ์„œ์น˜์™€ k-๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋ฉด ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์„ธ๋ถ€์ ์œผ๋กœ ์กฐ์ •ํ•˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ฐ™์€ ๊ฒƒ์ด ์•„๋‹Œ ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋น„๊ตํ•˜๋ ค๋ฉด ์ค‘์ฒฉ ๊ต์ฐจ ๊ฒ€์ฆ(nested cross-validation)์„ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค.

์ค‘์ฒฉ ๊ต์ฐจ ๊ฒ€์ฆ์€ k-๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ ๋ฃจํ”„์˜ ๋ฐ”๊นฅ์ชฝ ๋ถ€๋ถ„์ด ํ›ˆ๋ จ๊ณผ ๋ฐ์ดํ„ฐ ํด๋“œ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ„๊ณ , ์•ˆ์ชฝ ๋ฃจํ”„์—์„œ k-๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ์„ ์ˆ˜ํ–‰ํ•ด ๋ชจ๋ธ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋ฉด ํ…Œ์ŠคํŠธ ํด๋“œ๋กœ ํ‰๊ฐ€ํ•ด์ค๋‹ˆ๋‹ค. ์ผ๋‹จ ๊ทธ๋ฆผ์„ ๋ณผ๊นŒ์š”?

์ด ๊ทธ๋ฆผ์˜ ๊ฒฝ์šฐ์—๋Š” ๋ฐ”๊นฅ ๋ฃจํ”„์—๋Š” ๋‹ค์„ฏ ๊ฐœ ํด๋“œ๋ฅผ, ์•ˆ์ชฝ ๋ฃจํ”„์—๋Š” ๋‘ ๊ฐœ ํด๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฐœ๋…์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์œ ์šฉํ•˜๋ฉฐ, ์ด ๊ฒฝ์šฐ์—๋Š” 5x2 ๊ต์ฐจ ๊ฒ€์ฆ์ด๋ผ๊ณ ๋„ ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์ดํ‚ท๋Ÿฐ์—์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

gs = GridSearchCV(estimator=pipe_svc,
                  param_grid=param_grid,
                  scoring='accuracy',
                  cv=2)

scores = cross_val_score(gs, X_train, y_train, 
                         scoring='accuracy', cv=5)
print('CV ์ •ํ™•๋„: %.3f +/- %.3f' % (np.mean(scores),
                                      np.std(scores)))

๋ฐ˜ํ™˜๋œ ํ‰๊ท  ๊ต์ฐจ ๊ฒ€์ฆ ์ ์ˆ˜๋Š” ๋ชจ๋ธ์„ ํŠœ๋‹ํ–ˆ์„ ๋•Œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์—์„œ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ •ํ™•๋„๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ค‘์ฒฉ ๊ต์ฐจ ๊ฒ€์ฆ์˜ ์˜ˆ๋กœ, ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋น„๊ต๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋ฉด SVM๊ณผ ๋‹จ์ผ ๊ฒฐ์ • ํŠธ๋ฆฌ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from sklearn.tree import DecisionTreeClassifier
gs = GridSearchCV(estimator=DecisionTreeClassifier(random_state=0),
                  param_grid=[{'max_depth': [1, 2, 3, 4, 5, 6, 7, None]}],
                  scoring='accuracy',
                  cv=2)

scores = cross_val_score(gs, X_train, y_train, 
                         scoring='accuracy', cv=5)
print('CV ์ •ํ™•๋„: %.3f +/- %.3f' % (np.mean(scores), 
                                      np.std(scores)))

๊ฒฐ๊ณผ์—์„œ ํ™•์ธ ๊ฐ€๋Šฅ ํ•˜๋“ฏ์ด SVM ๋ชจ๋ธ์˜ ๊ฒ€์ฆ ์„ฑ๋Šฅ์€ ๊ฒฐ์ •ํŠธ๋ฆฌ์˜ ์„ฑ๋Šฅ๋ณด๋‹ค ๋” ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๋“ฏ ์ค‘์ฒฉ ๊ต์ฐจ ๊ฒ€์ฆ์„ ํ†ตํ•ด ๋” ๋‚˜์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

 


์ž, ์—ฌ๊ธฐ๊นŒ์ง€ ๋ชจ๋ธ์„ ๋” ์ข‹๊ฒŒ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ ์–ด๋–ค ๋ถ€๋ถ„์„ ๋ฐ”๊พธ๊ณ  ์ถ”๊ฐ€ํ•ด์•ผํ•˜๋Š”์ง€, ๋˜ ์–ด๋–ค ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋” ์ ํ•ฉํ•œ์ง€๋ฅผ ์‰ฝ๊ฒŒ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ฃผ๋Š” ๋„๊ตฌ๋“ค์„ ์‚ดํŽด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์„ธ์…˜ 16๋ถ€ํ„ฐ 18๊นŒ์ง€๋Š” ์ญ‰ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๊ฑฐ๋‚˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํŠœ๋‹ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ด์ฃผ๊ณ  ์žˆ์œผ๋‚˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•˜๊ณ  ์žˆ์œผ๋‹ˆ ์—ฐ๋‹ฌ์•„ ์ฝ์„ ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค! ํ•˜์ง€๋งŒ ๊ฐ™์ด ์•Œ์•„๋‘๋ฉด ์ข‹๊ฒ ์ฃ ? 

๋‹ค์Œ ์„ธ์…˜์—์„œ๋Š” ์„ฑ๋Šฅ ํ‰๊ฐ€์˜ ๋‹ค์–‘ํ•œ ์ง€ํ‘œ๋“ค๊ณผ, ํด๋ž˜์Šค์— ์žˆ๋Š” ์ƒ˜ํ”Œ์ด ๋ถˆ๊ท ํ˜•ํ•œ ๊ฒฝ์šฐ์— ์–ด๋–ป๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ์–ด์•ผํ•˜๋Š”์ง€์— ๋Œ€ํ•ด์„œ ์ด์•ผ๊ธฐํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ˆ˜๊ณ ํ•˜์…จ์Šต๋‹ˆ๋‹ค!

 

728x90
์ €์ž‘์žํ‘œ์‹œ (์ƒˆ์ฐฝ์—ด๋ฆผ)

'๐Ÿฌ ML & Data > ๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 19. ์•™์ƒ๋ธ”์˜ ์ •์˜์™€ ๋‹ค์ˆ˜๊ฒฐ ํˆฌํ‘œ!  (0) 2020.03.08
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 18. ROC ๊ณก์„ ๊ณผ ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ ๊ท ํ˜• ๋งž์ถ”๊ธฐ!  (0) 2020.02.29
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 16. ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๋ฌถ๊ณ , ๊ต์ฐจ ๊ฒ€์ฆ์œผ๋กœ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜์ž!  (0) 2020.02.26
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 15. ์ปค๋„ PCA๋ฅผ ์ด์šฉํ•œ ๋น„์„ ํ˜• ๋งคํ•‘  (0) 2020.02.24
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 14. LDA๋ฅผ ํ†ตํ•œ ์ง€๋„ํ•™์Šต๋ฐฉ์‹ ๋ฐ์ดํ„ฐ ์••์ถ•  (0) 2020.02.21
  1. A. ํ•™์Šต ๊ณก์„ ๊ณผ ๊ฒ€์ฆ ๊ณก์„ ์„ ์‚ฌ์šฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋””๋ฒ„๊น…
  2. 1. ํ•™์Šต ๊ณก์„ ์œผ๋กœ ํŽธํ–ฅ๊ณผ ๋ถ„์‚ฐ ๋ฌธ์ œ ๋ถ„์„
  3. 2. ๊ฒ€์ฆ ๊ณก์„ ์œผ๋กœ ๊ณผ๋Œ€์ ํ•ฉ๊ณผ ๊ณผ์†Œ์ ํ•ฉ ์กฐ์‚ฌ
  4. B. ๊ทธ๋ฆฌ๋“œ ์„œ์น˜๋ฅผ ์‚ฌ์šฉํ•œ ๋จธ์‹  ๋Ÿฌ๋‹ ๋ชจ๋ธ ์„ธ๋ถ€ ํŠœ๋‹
  5. 1. ๊ทธ๋ฆฌ๋“œ ์„œ์น˜๋ฅผ ์‚ฌ์šฉํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹
  6. 2. ์ค‘์ฒฉ ๊ต์ฐจ ๊ฒ€์ฆ์„ ์‚ฌ์šฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ ํƒ
'๐Ÿฌ ML & Data/๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 19. ์•™์ƒ๋ธ”์˜ ์ •์˜์™€ ๋‹ค์ˆ˜๊ฒฐ ํˆฌํ‘œ!
  • [๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 18. ROC ๊ณก์„ ๊ณผ ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ ๊ท ํ˜• ๋งž์ถ”๊ธฐ!
  • [๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 16. ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๋ฌถ๊ณ , ๊ต์ฐจ ๊ฒ€์ฆ์œผ๋กœ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜์ž!
  • [๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 15. ์ปค๋„ PCA๋ฅผ ์ด์šฉํ•œ ๋น„์„ ํ˜• ๋งคํ•‘
darly213
darly213
ํ˜ธ๋ฝํ˜ธ๋ฝํ•˜์ง€ ์•Š์€ ๊ฐœ๋ฐœ์ž๊ฐ€ ๋˜์–ด๋ณด์ž
  • darly213
    ERROR DENY
    darly213
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (97)
      • ๐Ÿฌ ML & Data (50)
        • ๐ŸŒŠ Computer Vision (2)
        • ๐Ÿ“ฎ Reinforcement Learning (12)
        • ๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ (8)
        • ๐Ÿฆ„ ๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹ (3)
        • โ” Q & etc. (5)
        • ๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹ (20)
      • ๐Ÿฅ Web (21)
        • โšก Back-end | FastAPI (2)
        • โ›… Back-end | Spring (5)
        • โ” Back-end | etc. (9)
        • ๐ŸŽจ Front-end (4)
      • ๐ŸŽผ Project (8)
        • ๐ŸงŠ Monitoring System (8)
      • ๐Ÿˆ Algorithm (0)
      • ๐Ÿ”ฎ CS (2)
      • ๐Ÿณ Docker & Kubernetes (3)
      • ๐ŸŒˆ DEEEEEBUG (2)
      • ๐ŸŒ  etc. (8)
      • ๐Ÿ˜ผ ์‚ฌ๋‹ด (1)
  • ๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

    • ํ™ˆ
    • ๋ฐฉ๋ช…๋ก
    • GitHub
    • Notion
    • LinkedIn
  • ๋งํฌ

    • Github
    • Notion
  • ๊ณต์ง€์‚ฌํ•ญ

    • Contact ME!
  • 250x250
  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
darly213
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 17. ํ•™์Šต๊ณผ ๊ฒ€์ฆ ๊ณก์„ , ๊ทธ๋ฆฌ๊ณ  ๊ทธ๋ฆฌ๋“œ ์„œ์น˜

๊ฐœ์ธ์ •๋ณด

  • ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ
  • ํฌ๋Ÿผ
  • ๋กœ๊ทธ์ธ
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.