[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 14. LDA๋ฅผ ํ†ตํ•œ ์ง€๋„ํ•™์Šต๋ฐฉ์‹ ๋ฐ์ดํ„ฐ ์••์ถ•

2020. 2. 21. 01:07ยท๐Ÿฌ ML & Data/๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹
728x90

์„ ํ˜• ํŒ๋ณ„ ๋ถ„์„(Linear Discriminant Analysis)์€ ๊ทœ์ œ๊ฐ€ ์—†๋Š” ๋ชจ๋ธ์—์„œ ์˜ค๋ฒ„ํ”ผํŒ… ์ •๋„๋ฅผ ์ค„์ด๊ณ  ๊ณ„์‚ฐ ํšจ์œจ์ •์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ํŠน์„ฑ์ถ”์ถœ ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.

LDA์˜ ๊ฐœ๋…์€ PCA์™€ ์ƒ๋‹นํžˆ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. PCA๊ฐ€ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ถ„์‚ฐ์ด ์ตœ๋Œ€์ธ ์„ฑ๋ถ„์ถ•์„ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ๋ผ๋ฉด LDA๋Š” ํด๋ž˜์Šค๋ฅผ ์ตœ์ ์œผ๋กœ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋Š” ํŠน์„ฑ ๋ถ€๋ถ„ ๊ณต๊ฐ„์„ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

1. ์ฃผ์„ฑ๋ถ„ ๋ถ„์„ vs ์„ ํ˜• ํŒ๋ณ„ ๋ถ„์„

PCA์™€ LDA ๋ชจ๋‘ ๋ฐ์ดํ„ฐ์…‹์˜ ์ฐจ์› ๊ฐœ์ˆ˜๋ฅผ ์ค„์ด๋Š” ์„ ํ˜• ๋ณ€ํ™˜ ๊ธฐ๋ฒ•์ด์ง€๋งŒ PCA๋Š” ๋น„์ง€๋„, LDA๋Š” ์ง€๋„ํ•™์Šต์ด๋ผ๋Š” ์ ์—์„œ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์—ฌ๋Ÿฌ๋ถ„์€ LDA๊ฐ€ ํด๋ž˜์Šค ๊ตฌ๋ถ„์„ ์œ„ํ•ด ํŠน์„ฑ ๋ถ€๋ถ„ ๊ณต๊ฐ„์„ ์ฐพ๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋ถ„๋ฅ˜์— ๋›ฐ์–ด๋‚˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜์‹ค ์ˆ˜ ์žˆ๋Š”๋ฐ์š”, ์‚ฌ์‹ค์€ ๊ทธ๋ ‡์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค.

๋งˆ๋ฅดํ‹ฐ๋„ค์Šค๋Š” PCA๋ฅผ ํ†ตํ•œ ์ „์ฒ˜๋ฆฌ๊ฐ€ ํŠน์„ฑ ์ด๋ฏธ์ง€ ์ธ์‹ ์ž‘์—…์— ๋” ๋›ฐ์–ด๋‚œ ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚ธ๋‹ค๊ณ  ๋ณด๊ณ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‘ ๊ฐœ์˜ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณผ๊นŒ์š”?

์œ„ ๊ทธ๋ฆผ์—์„œ x ์ถ•, LD1์— ํˆฌ์˜๋˜๋Š” ์„ ํ˜• ํŒ๋ณ„ ๋ฒกํ„ฐ๋Š” ๋‘ ๊ฐ€์ง€ ํด๋ž˜์Šค๋ฅผ ์ž˜ ๊ตฌ๋ถ„ํ•ด์ค๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ y์ถ•, LD2๋กœ ํˆฌ์˜๋˜๋Š” ์„ ํ˜• ํŒ๋ณ„ ๋ฐฑํ„ฐ๋Š” ๋ถ„์‚ฐ์€ ์žก์•„๋‚ด๋‚˜ ํด๋ž˜์Šค ํŒ๋ณ„ ์ •๋ณด๊ฐ€ ์—†์–ด ์ข‹์€ ์„ ํ˜• ํŒ๋ณ„ ๋ฒกํ„ฐ๋Š” ์•„๋‹™๋‹ˆ๋‹ค.

LDA๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์ •๊ทœ๋ถ„ํฌ๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํด๋ž˜์Šค๋Š” ๋™์ผํ•œ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์„ ๊ฐ–๊ณ  ์ƒ˜ํ”Œ์€ ์„œ๋กœ ํ†ต๊ณ„์ ์œผ๋กœ ๋…๋ฆฝ์ด๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๊ฐ€์ •๋“ค์ด ์กฐ๊ธˆ ์œ„๋ฐ˜๋œ๋‹ค๊ณ  ํ•˜๋”๋ผ๋„ LDA๋Š” ์ฐจ์›์ถ•์†Œ๋ฅผ ์ž˜ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

2. ์„ ํ˜• ํŒ๋ณ„ ๋ถ„์„์˜ ๋‚ด๋ถ€ ๋™์ž‘ ๋ฐฉ์‹

PCA์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ LDA์˜ ๋‹จ๊ณ„๋ฅผ ์š”์•ฝํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

  1. d์ฐจ์› ๋ฐ์ดํ„ฐ์…‹์„ ํ‘œ์ค€ํ™” ์ „์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. (d = ํŠน์„ฑ๊ฐœ์ˆ˜)

  2. ๊ฐ ํด๋ž˜์Šน ๋Œ€ํ•ด d์ฐจ์› ํ‰๊ท  ๋ฒกํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

  3. ํด๋ž˜์Šค ๊ฐ„์˜ ์‚ฐํฌ ํ–‰๋ ฌ(scatter matrix) S(B)์™€ ํด๋ž˜์Šค ๋‚ด๋ถ€์˜ ์‚ฐํฌํ–‰๋ ฌ S(W)๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

  4. S(W)์˜ ์—ญํ–‰๋ ฌ๊ณผ S(B)์˜ ๊ณฑํ–‰๋ ฌ์˜ ๊ณ ์œ ๋ฒกํ„ฐ์™€ ๊ณ ์œณ๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

  5. ๊ณ ์œณ๊ฐ’์„ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•ด ์ˆœ์„œ๋ฅผ ๋งค๊น๋‹ˆ๋‹ค.

  6. ๊ณ ์œณ๊ฐ’์ด ๊ฐ€์žฅ ํฐ k๊ฐœ์˜ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ์„ ํƒํ•ด d * k ์ฐจ์›์˜ ๋ณ€ํ™˜ํ–‰๋ ฌ W๋ฅผ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.

  7. ๋ณ€ํ™˜ ํ–‰๋ ฌ W๋ฅผ ์‚ฌ์šฉํ•ด ์ƒˆ๋กœ์šด ํŠน์„ฑ ๋ถ€๋ถ„ ๊ณต๊ฐ„์œผ๋กœ ํˆฌ์˜ํ•ฉ๋‹ˆ๋‹ค.

    ์ด ๊ณผ์ •์ด ๊ต‰์žฅํžˆ ๋‚ฏ์ต์ฃ ? LDA๋Š” ํ–‰๋ ฌ์„ ๊ณ ์œณ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ๋กœ ๋ถ„ํ•ดํ•ด์„œ ์ƒˆ๋กœ์šด ์ €์ฒ˜์› ํŠน์„ฑ ๊ณต๊ฐ„์„ ๊ตฌ์„ฑํ•œ๋‹ค๋Š” ์ ์—์„œ ๋งค์šฐ PCA์™€ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค. ๋ฌผ๋ก  LDA๋Š” 2๋‹จ๊ณ„์—์„œ ํ‰๊ท  ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ค ๋•Œ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ๋ ˆ์ด๋ธ” ๋ณ„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ„์–ด ํ‰๊ท ์„ ๊ตฌํ•œ๋‹ค๋Š” ์ ์—์„œ ๋‹ค๋ฅด์ง€๋งŒ์š”.

3. ์‚ฐํฌ ํ–‰๋ ฌ ๊ณ„์‚ฐ

์ด์ „ ์„ธ์…˜ PCA์—์„œ 1์˜ ๊ณผ์ •์€ ์ด๋ฏธ ํ•ด๋‘์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ํ‰๊ท  ๋ฒกํ„ฐ ๊ณ„์‚ฐ๋ถ€ํ„ฐ ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ํ‰๊ท  ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ํด๋ž˜์Šค ๊ฐ„ ์‚ฐํฌ ํ–‰๋ ฌ๊ณผ ํด๋ž˜์Šค ๋‚ด๋ถ€์˜ ์‚ฐํฌ ํ–‰๋ ฌ์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํ‰๊ท  ๋ฒกํ„ฐ m์€ ํด๋ž˜์Šค i์˜ ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ํ‰๊ท ๊ฐ’ ๋ฎค๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์„ธ ๊ฐœ์˜ ํŽธ๊ท  ๋ฒกํ„ฐ๊ฐ€ ๋งŒ๋“ค์–ด์ง‘๋‹ˆ๋‹ค. (Wine ๋ฐ์ดํ„ฐ์…‹ ๊ธฐ์ค€, ํด๋ž˜์Šค๊ฐ€ 3๊ฐœ!)

np.set\_printoptions(precision=4)

mean\_vecs = \[\]  
for label in range(1, 4):  
[mean\_vecs.append(np.mean(X\_train\_std\[y\_train](mean_vecs.append(np.mean(X_train_std%5By_train) == label\], axis=0))  
print('MV %s: %s\\n' % (label, mean\_vecs\[label - 1\]))  

ํ‰๊ท  ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ํด๋ž˜์Šค ๋‚ด๋ถ€ ์‚ฐํฌ ํ–‰๋ ฌ S(W)๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์— ๊ฐœ๋ณ„ ํด๋ž˜์Šค i์˜ ์‚ฐํฌํ–‰๋ ฌ S(i)๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

d = 13 # ํŠน์„ฑ์˜ ์ˆ˜  
S\_W = [np.zeros((d,](np.zeros((d,) d))  
for label, mv in zip(range(1, 4), mean\_vecs):  
class\_scatter = [np.zeros((d,](np.zeros((d,) d)) # scatter matrix for each class  
for row in X\_train\_std\[y\_train == label\]:  
row, mv = row.reshape(d, 1), mv.reshape(d, 1) # make column vectors  
class\_scatter += (row - mv).dot((row - mv).T)  
S\_W += class\_scatter # sum class scatter matrices

print('ํด๋ž˜์Šค ๋‚ด์˜ ์‚ฐํฌ ํ–‰๋ ฌ: %sx%s' % ([S\_W.shape\[0\],](S_W.shape%5B0%5D,) [S\_W.shape\[1\]))](S_W.shape%5B1%5D)))  

์‚ฐํฌ ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•  ๋•Œ์—๋Š” ํ›ˆ๋ จ ์„ธํŠธ์˜ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์ด ๊ท ๋“ฑํ•˜๊ฒŒ ๋ถ„ํฌ๋˜์–ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” ๊ฐœ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•˜๋ฉด ์ด ๊ฐ€์ •์ด ํ‹€๋ ธ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

print('ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” ๋ถ„ํฌ: %s'  
% np.bincount(y\_train)\[1:\])  

๊ฐœ๋ณ„ ์‚ฐํฌ ํ–‰๋ ฌ์„ ์ „์ฒด์— ๋”ํ•˜๊ธฐ ์ „์— ์Šค์ผ€์ผ์„ ์กฐ์ •ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์‚ฐํฌ ํ–‰๋ ฌ์„ ํด๋ž˜์Šค ์ƒ˜ํ”Œ ๊ฐœ์ˆ˜๋กœ ๋‚˜๋ˆ„๋ฉด ์‚ฐํฌ ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒŒ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์€ ์‚ฐํฌ ํ–‰๋ ฌ์˜ ์ •๊ทœํ™” ๋ฒ„์ „์ž…๋‹ˆ๋‹ค!

d = 13 # ํŠน์„ฑ์˜ ์ˆ˜  
S\_W = [np.zeros((d,](np.zeros((d,) d))  
for label, mv in zip(range(1, 4), mean\_vecs):  
class\_scatter = [np.cov(X\_train\_std\[y\_train](np.cov(X_train_std%5By_train) == label\].T, bias=True)  
S\_W += class\_scatter  
print('์Šค์ผ€์ผ ์กฐ์ •๋œ ํด๋ž˜์Šค ๋‚ด์˜ ์‚ฐํฌ ํ–‰๋ ฌ: %sx%s' % ([S\_W.shape\[0\],](S_W.shape%5B0%5D,)  
[S\_W.shape\[1\]))](S_W.shape%5B1%5D)))  

ํด๋ž˜์Šค ๋‚ด๋ถ€์˜ ์‚ฐํฌ ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•œ ๋‹ค์Œ, ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ํด๋ž˜์Šค ๊ฐ„์˜ ์‚ฐํฌํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ m์€ ๋ชจ๋“  ํด๋ž˜์Šค์˜ ์ƒ˜ํ”Œ์„ ํฌํ•จํ•ด ๊ณ„์‚ฐํ•œ ์ „์ฒด ํ‰๊ท ์ž…๋‹ˆ๋‹ค.

mean\_overall = [np.mean(X\_train\_std,](np.mean(X_train_std,) axis=0)  
mean\_overall = mean\_overall.reshape(d, 1) # ์—ด ๋ฒกํ„ฐ๋กœ ๋งŒ๋“ค๊ธฐ  
d = 13 # ํŠน์„ฑ์˜ ์ˆ˜  
S\_B = [np.zeros((d,](np.zeros((d,) d))  
for i, mean\_vec in enumerate(mean\_vecs):  
n = X\_train\[y\_train == i + 1, :\].shape\[0\]  
mean\_vec = mean\_vec.reshape(d, 1) # ์—ด ๋ฒกํ„ฐ๋กœ ๋งŒ๋“ค๊ธฐ  
S\_B += n \* (mean\_vec - mean\_overall).dot((mean\_vec - mean\_overall).T)

print('ํด๋ž˜์Šค ๊ฐ„์˜ ์‚ฐํฌ ํ–‰๋ ฌ: %sx%s' % ([S\_B.shape\[0\],](S_B.shape%5B0%5D,) [S\_B.shape\[1\]))](S_B.shape%5B1%5D)))  

4. ์ƒˆ๋กœ์šด ํŠน์„ฑ ๋ถ€๋ถ„ ๊ณต๊ฐ„์„ ์œ„ํ•ด ์„ ํ˜• ํŒ๋ณ„ ๋ฒกํ„ฐ ์„ ํƒ

๋‚จ์€ ๋‹จ๊ณ„๋Š” PCA์™€ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์ด์ œ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ ๊ณ ์œณ๊ฐ’์„ ๋ถ„ํ•ดํ•˜๋Š” ๋Œ€์‹  ํ–‰๋ ฌ S(W)์˜ ์—ญํ–‰๋ ฌ๊ณผ S(B)์˜ ๊ณฑํ–‰๋ ฌ์˜ ๊ณ ์œณ๊ฐ’์„ ๊ณ„์‚ฐํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ณ ์œณ๊ฐ’์„ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.

eigen\_vals, eigen\_vecs = [np.linalg.eig(np.linalg.inv(S\_W).dot(S\_B))](np.linalg.eig(np.linalg.inv(S_W).dot(S_B)))

...

(๊ณ ์œณ๊ฐ’, ๊ณ ์œ ๋ฒกํ„ฐ) ํŠœํ”Œ์˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.  
eigen\_pairs = \[([np.abs(eigen\_vals\[i\]),](np.abs(eigen_vals%5Bi%5D),) eigen\_vecs\[:, i\])  
for i in range(len(eigen\_vals))\]

# (๊ณ ์œณ๊ฐ’, ๊ณ ์œ ๋ฒกํ„ฐ) ํŠœํ”Œ์„ ํฐ ๊ฐ’์—์„œ ์ž‘์€ ๊ฐ’ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.

eigen\_pairs = sorted(eigen\_pairs, key=lambda k: k\[0\], reverse=True)

# ๊ณ ์œณ๊ฐ’์˜ ์—ญ์ˆœ์œผ๋กœ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ •๋ ฌ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

print('๋‚ด๋ฆผ์ฐจ์ˆœ์˜ ๊ณ ์œณ๊ฐ’:\\n')  
for eigen\_val in eigen\_pairs:  
print(eigen\_val\[0\])  

LDA์—์„œ ์„ ํ˜• ํŒ๋ณ„ ๋ฒกํ„ฐ๋Š” ์ตœ๋Œ€ c-1๊ฐœ ์ž…๋‹ˆ๋‹ค. c๋Š” ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” ๊ฐœ์ˆ˜์ž…๋‹ˆ๋‹ค.

์ด์ œ ์„ ํ˜• ํŒ๋ณ„ ๋ฒกํ„ฐ๋กœ ์ฐพ์€ ํด๋ž˜์Šค ํŒ๋ณ„ ์ •๋ณด๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์œณ๊ฐ’ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์„ ํ˜• ํŒ๋ณ„ ๋ฒกํ„ฐ๋ฅผ ๊ทธ๋ ค๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

tot = sum([eigen\_vals.real)](eigen_vals.real))  
discr = \[(i / tot) for i in sorted([eigen\_vals.real,](eigen_vals.real,) reverse=True)\]  
cum\_discr = [np.cumsum(discr)](np.cumsum(discr))

[plt.bar(range(1,](plt.bar(range(1,) 14), discr, alpha=0.5, align='center',  
label='individual "discriminability"')  
[plt.step(range(1,](plt.step(range(1,) 14), cum\_discr, where='mid',  
label='cumulative "discriminability"')  
[plt.ylabel('](plt.ylabel(')"discriminability" ratio')  
[plt.xlabel('Linear](plt.xlabel('Linear) Discriminants')  
[plt.ylim(\[-0.1,](plt.ylim(%5B-0.1,) 1.1\])  
[plt.legend(loc='best')](plt.legend(loc='best'))  
plt.tight\_layout()  
[plt.show()](plt.show())  

๊ทธ๋ž˜ํ”„์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ ๋‘ ๊ฐœ์˜ ์„ ํ˜• ํŒ๋ณ„ ๋ฒกํ„ฐ๊ฐ€ Wine ๋ฐ์ดํ„ฐ ์…‹์˜ ์ •๋ณด๋Ÿ‰์˜ 100%๋ฅผ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

์ด์ œ ๋‘ ๊ฐœ์˜ ํŒ๋ณ„ ๋ฒกํ„ฐ๋ฅผ ์—ด๋กœ ๋งŒ๋“ค์–ด ๋ณ€ํ™˜ ํ–‰๋ ฌ W๋กœ ๋งŒ๋“ค์–ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

w = [np.hstack((eigen\_pairs\[0\]\[1\]\[:,](np.hstack((eigen_pairs%5B0%5D%5B1%5D%5B:,) np.newaxis\].real,  
eigen\_pairs\[1\]\[1\]\[:, np.newaxis\].real))  
print('ํ–‰๋ ฌ W:\\n', w)  

5. ์ƒˆ๋กœ์šด ํŠน์„ฑ ๊ณต๊ฐ„์œผ๋กœ ํˆฌ์˜

์ด์ œ ๋ณ€ํ™˜ ํ–‰๋ ฌ W๋ฅผ ํ›ˆ๋ จ ์„ธํŠธ์— ๊ณฑํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•ด์ค๋‹ˆ๋‹ค.

X\_train\_lda = [X\_train\_std.dot(w)](X_train_std.dot(w))  
colors = \['r', 'b', 'g'\]  
markers = \['s', 'x', 'o'\]

for l, c, m in zip([np.unique(y\_train),](np.unique(y_train),) colors, markers):  
plt.scatter(X\_train\_lda\[y\_train == l, 0\],  
X\_train\_lda\[y\_train == l, 1\] \* (-1),  
c=c, label=l, marker=m)

[plt.xlabel('LD](plt.xlabel('LD) 1')  
[plt.ylabel('LD](plt.ylabel('LD) 2')  
[plt.legend(loc='lower](plt.legend(loc='lower) right')  
plt.tight\_layout()  
[plt.show()](plt.show())  

์œ„ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ์„ธ ๊ฐœ์˜ ์™€์ธ ํด๋ž˜์Šค๋ฅผ ์ƒˆ๋กœ์šด ๋ถ€๋ถ„๊ณต๊ฐ„์— ํˆฌ์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด์ œ ์„ ํ˜•์ ์œผ๋กœ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

6. ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ LDA

์ง€๊ธˆ๊นŒ์ง€ ๋‹จ๊ณ„๋ณ„๋กœ LDA๋ฅผ ๊ตฌํ˜„ํ•ด๋ณด์•˜์œผ๋‹ˆ ์ด์ œ ์‚ฌ์ดํ‚ท๋Ÿฐ์œผ๋กœ ๊ตฌํ˜„๋œ LDA ํด๋ž˜์Šค๋ฅผ ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

from sklearn.discriminant\_analysis import LinearDiscriminantAnalysis as LDA

lda = LDA(n\_components=2)  
X\_train\_lda = lda.fit\_transform(X\_train\_std, y\_train)  
...

from sklearn.linear\_model import LogisticRegression  
lr = LogisticRegression(solver='liblinear', multi\_class='auto')  
lr = [lr.fit(X\_train\_lda,](lr.fit(X_train_lda,) y\_train)

plot\_decision\_regions(X\_train\_lda, y\_train, classifier=lr)  
[plt.xlabel('LD](plt.xlabel('LD) 1')  
[plt.ylabel('LD](plt.ylabel('LD) 2')  
[plt.legend(loc='lower](plt.legend(loc='lower) left')  
plt.tight\_layout()  
[plt.show()](plt.show())  

LDA๋กœ ๋ณ€ํ™˜ํ•œ ์ €์ฐจ์› ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์— ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ๋™์ž‘์‹œ์ผฐ์„ ๋•Œ, ํด๋ž˜์Šค 2์˜ ์ƒ˜ํ”Œ ํ•˜๋‚˜๋ฅผ ์ œ๋Œ€๋กœ ๋ถ„๋ฅ˜ํ•˜์ง€ ๋ชปํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ ๊ทœ์ œ๊ฐ•๋„๋ฅผ ์‚ด์ง ๋‚ฎ์ถ”์–ด ๋ชจ๋“  ์ƒ˜ํ”Œ์„ ๋ถ„๋ฅ˜ํ•˜๋„๋ก ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์ผ๋‹จ ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณผ๊นŒ์š”?

X\_test\_lda = lda.transform(X\_test\_std)

plot\_decision\_regions(X\_test\_lda, y\_test, classifier=lr)  
[plt.xlabel('LD](plt.xlabel('LD) 1')  
[plt.ylabel('LD](plt.ylabel('LD) 2')  
[plt.legend(loc='lower](plt.legend(loc='lower) left')  
plt.tight\_layout()  
[plt.show()](plt.show())  

๊ทธ๋ž˜ํ”„์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๊ตณ์ด ๊ณผ๋Œ€์ ํ•ฉ์˜ ์œ„ํ—˜์„ ์˜ฌ๋ฆฌ๋ฉด์„œ๊นŒ์ง€ ๊ทœ์ œ๋ฅผ ๊ฐ•ํ™”ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

 

 ๋งˆ์ง€๋ง‰์œผ๋กœ ์ •๋ฆฌ๋ฅผ ํ•œ ๋ฒˆ ํ•ด๋ณผ๊นŒ์š”? LDA๋Š” PCA์™€ ์ƒ๋‹นํžˆ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ LDA์˜ ๋ชฉ์ ์€ ํด๋ž˜์Šค ๋ถ„๋ฆฌ๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ฃผ์ถ•์„ ์ฐพ๋Š” ๊ฒƒ์ด๊ณ , ์ง€๋„ํ•™์Šต์ด๋ผ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์ด ์žˆ๋‹ค๋Š” ์ ์ด์ฃ . ๋“ฑ์žฅํ•œ ์‚ฐํฌํ–‰๋ ฌ์€ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ ์ •๊ทœํ™” ์ „ ๋‹จ๊ณ„์ด๊ณ , ์„ ํ˜• ํŒ๋ณ„ ๋ฒกํ„ฐ๋Š” ํด๋ ˆ์Šค ๋ ˆ์ด๋ธ”์˜ ๊ฐœ์ˆ˜ - 1๋ณด๋‹ค ํด ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. PCA์ฒ˜๋Ÿผ ๋ณ€ํ™˜ ํ–‰๋ ฌ์„ ๋งŒ๋“ค๊ณ  ๊ณฑ์…ˆ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•ด์ฃผ์ฃ .


 ์—ฌ๊ธฐ๊นŒ์ง€ ์„ ํ˜• ํŒ๋ณ„ ๋ถ„์„ LDA๋ฅผ ์•Œ์•„๋ณด์•˜์Šต๋‹ˆ๋‹ค. PCA์™€ ์ •๋ง ์œ ์‚ฌํ•˜์ฃ ? ์•ˆ ๋ณด๊ณ  ์˜ค์‹  ๋ถ„๋“ค์€ ์ด์ „ ์„ธ์…˜์˜ PCA๋ฅผ ๊ผญ ํ™•์ธํ•˜๊ณ  ์˜ค์‹œ๊ธฐ๋ฅผ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๋‹ค์Œ ์„ธ์…˜ 15์—์„œ๋„ PCA์™€ ์—ฐ๊ฒฐ๋œ ์ปค๋„ PCA๋ฅผ ๋‹ค๋ฃฐ ์˜ˆ์ •์ด๋‹ˆ, 13, 14, 15๋ฒˆ์€ ํ•จ๊ป˜ ์ฝ๋Š” ๊ฒƒ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๋‹ค์Œ ์„ธ์…˜์—์„œ ๋ดฌ์š”!

728x90
์ €์ž‘์žํ‘œ์‹œ (์ƒˆ์ฐฝ์—ด๋ฆผ)

'๐Ÿฌ ML & Data > ๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 16. ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๋ฌถ๊ณ , ๊ต์ฐจ ๊ฒ€์ฆ์œผ๋กœ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜์ž!  (0) 2020.02.26
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 15. ์ปค๋„ PCA๋ฅผ ์ด์šฉํ•œ ๋น„์„ ํ˜• ๋งคํ•‘  (0) 2020.02.24
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 13. ๋น„์ง€๋„ ์ฐจ์›์ถ•์†Œ! PCA!  (0) 2020.02.18
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 12. ์ˆœ์ฐจ ํŠน์„ฑ ์„ ํƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ ํŠน์„ฑ ์ค‘์š”๋„ ์‚ฌ์šฉ  (0) 2020.02.16
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 11. ๋ฐ์ดํ„ฐ์…‹ ๋‚˜๋ˆ„๊ธฐ์™€ ํŠน์„ฑ ์Šค์ผ€์ผ๊ณผ ์„ ํƒ  (0) 2020.02.13
'๐Ÿฌ ML & Data/๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 16. ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๋ฌถ๊ณ , ๊ต์ฐจ ๊ฒ€์ฆ์œผ๋กœ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜์ž!
  • [๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 15. ์ปค๋„ PCA๋ฅผ ์ด์šฉํ•œ ๋น„์„ ํ˜• ๋งคํ•‘
  • [๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 13. ๋น„์ง€๋„ ์ฐจ์›์ถ•์†Œ! PCA!
  • [๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 12. ์ˆœ์ฐจ ํŠน์„ฑ ์„ ํƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ ํŠน์„ฑ ์ค‘์š”๋„ ์‚ฌ์šฉ
darly213
darly213
ํ˜ธ๋ฝํ˜ธ๋ฝํ•˜์ง€ ์•Š์€ ๊ฐœ๋ฐœ์ž๊ฐ€ ๋˜์–ด๋ณด์ž
  • darly213
    ERROR DENY
    darly213
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (97)
      • ๐Ÿฌ ML & Data (50)
        • ๐ŸŒŠ Computer Vision (2)
        • ๐Ÿ“ฎ Reinforcement Learning (12)
        • ๐Ÿ“˜ ๋…ผ๋ฌธ & ๋ชจ๋ธ ๋ฆฌ๋ทฐ (8)
        • ๐Ÿฆ„ ๋ผ์ดํŠธ ๋”ฅ๋Ÿฌ๋‹ (3)
        • โ” Q & etc. (5)
        • ๐ŸŽซ ๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹ (20)
      • ๐Ÿฅ Web (21)
        • โšก Back-end | FastAPI (2)
        • โ›… Back-end | Spring (5)
        • โ” Back-end | etc. (9)
        • ๐ŸŽจ Front-end (4)
      • ๐ŸŽผ Project (8)
        • ๐ŸงŠ Monitoring System (8)
      • ๐Ÿˆ Algorithm (0)
      • ๐Ÿ”ฎ CS (2)
      • ๐Ÿณ Docker & Kubernetes (3)
      • ๐ŸŒˆ DEEEEEBUG (2)
      • ๐ŸŒ  etc. (8)
      • ๐Ÿ˜ผ ์‚ฌ๋‹ด (1)
  • ๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

    • ํ™ˆ
    • ๋ฐฉ๋ช…๋ก
    • GitHub
    • Notion
    • LinkedIn
  • ๋งํฌ

    • Github
    • Notion
  • ๊ณต์ง€์‚ฌํ•ญ

    • Contact ME!
  • 250x250
  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
darly213
[๋ผ์ดํŠธ ๋จธ์‹ ๋Ÿฌ๋‹] Session 14. LDA๋ฅผ ํ†ตํ•œ ์ง€๋„ํ•™์Šต๋ฐฉ์‹ ๋ฐ์ดํ„ฐ ์••์ถ•
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”