Codelog

[AIB] Intermediate Linear Algebra ๋ณธ๋ฌธ

Boot Camp/section1

[AIB] Intermediate Linear Algebra

minzeros 2021. 12. 28. 01:02

๐Ÿ’ก Variance, ๋ถ„์‚ฐ

๋ถ„์‚ฐ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ผ๋งˆ๋‚˜ ํผ์ ธ์žˆ๋Š”์ง€๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

์ด๋Š” ๊ฐ ๊ฐ’๋“ค์˜ ํ‰๊ท ์œผ๋กœ๋ถ€ํ„ฐ ์ฐจ์ด์˜ ์ œ๊ณฑ ํ‰๊ท ๊ฐ’์ด๋‹ค.

์ฆ‰, ๋ถ„์‚ฐ์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ํ‰๊ท ์„ ๋จผ์ € ๊ณ„์‚ฐํ•ด์•ผํ•œ๋‹ค.

 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random

# ๋žœ๋คํ•œ 50๊ฐœ์˜ ์ •์ˆ˜๋ฅผ ํฌํ•จํ•˜๋Š” 2 ๋ณ€์ˆ˜ ์„ค์ •.
variance_one = []
variance_two = []

for x in range(50):
  variance_one.append(random.randint(25,75))
  variance_two.append(random.randint(0,100))
  
variance_data = {'v1': variance_one, 'v2': variance_two}

variance_df = pd.DataFrame(variance_data)
variance_df['zeros'] = pd.Series(list(np.zeros(50)))

variance_df.head()

output :

 

# scatter plot

plt.scatter(variance_df.v1, variance_df.zeros)
plt.xlim(0,100)
plt.title("Plot 1")
plt.show()

plt.scatter(variance_df.v2, variance_df.zeros)
plt.xlim(0,100)
plt.title("Plot 2")
plt.show()

output :

์œ„์˜ ๋‘ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ, ํผ์ ธ์žˆ๋Š” ์ •๋„์˜ ์ฐจ์ด๋ฅผ ์‰ฝ๊ฒŒ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ถ„์‚ฐ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์†Œ๋ฌธ์ž v๋กœ ํ‘œ๊ธฐ๋˜๋ฉฐ ํ•„์š”์— ๋”ฐ๋ผ σ^2 ๋กœ ํ‘œ๊ธฐ๋˜๊ธฐ๋„ ํ•œ๋‹ค.

# ํ‰๊ท 
v1_mean = variance_df.v1.mean()
print("v1 mean: ", v1_mean)
v2_mean = variance_df.v2.mean()
print("v2 mean: ", v2_mean)

# ๊ฐ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ‰๊ท ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ
variance_df['v1_distance'] = variance_df.v1 - v1_mean
variance_df['v2_distance'] = variance_df.v2 - v2_mean

variance_df.head()

output :

# ์ œ๊ณฑ
variance_df['v1_squared_distance'] = variance_df.v1_distance ** 2
variance_df['v2_squared_distance'] = variance_df.v2_distance ** 2

# ์ œ๊ณฑ์„ ํ†ตํ•ด์„œ ์Œ์ˆ˜๋ฅผ ์–‘์ˆ˜๋กœ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ๋‹ค.
variance_df.head()

output :

# ๋”ํ•˜๊ณ  ๋‚˜๋ˆ”
observations = len(variance_df)
print("Number of Observations: ", observations)

Variance_One = variance_df.v1_squared_distance.sum() / observations
Variance_Two = variance_df.v2_squared_didstance.sum() / observations

print("Variance One: ", Variance_One)
print("Variance Two: ", Variance_Two)

output : 

random number๋ฅผ ์ƒ์„ฑํ•  ๋•Œ, v1์€ 25~75 ๋ฒ”์œ„์—์„œ, v2๋Š” 0~100 ๋ฒ”์œ„์—์„œ ์„œ๋กœ 2๋ฐฐ ์ •๋„ ์ฐจ์ด๊ฐ€ ๋‚˜๊ฒŒ ์ƒ์„ฑํ–ˆ์ง€๋งŒ,

๋ถ„์‚ฐ ์ฐจ์ด๋Š” 2๋ฐฐ๋ณด๋‹ค ํ›จ์”ฌ ๋” ํฌ๋‹ค.

 

 

โœจ ํŒŒ์ด์ฌ ๋‚ด์žฅ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์„œ Variance ์‰ฝ๊ฒŒ ๊ณ„์‚ฐํ•˜๊ธฐ

print(variance_df.v1.var(ddof = 1))
print(variance_df.v2.var(ddof = 1))

# ddof ํŒŒ๋ผ๋ฏธํ„ฐ
# Delta Degrees of Freedom, ์ž์œ ๋„

output :

 

์ฃผ์˜ํ•  ์ ์œผ๋กœ, ์œ„์˜ ๊ฒฐ๊ณผ๋Š” ์ด์ „์— ์ง์ ‘ ๊ณ„์‚ฐํ–ˆ๋˜ ๊ฒฐ๊ณผ์™€ ์กฐ๊ธˆ์€ ๋‹ค๋ฅด๋‹ค. 

๊ทธ ์ด์œ ๋Š” ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋ชจ์ง‘๋‹จ์ด๋ƒ ํ˜น์€ ์ƒ˜ํ”Œ์ด๋ƒ์— ๋”ฐ๋ผ์„œ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

์ผ๋ฐ˜์ ์œผ๋กœ, ์ƒ˜ํ”Œ์˜ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•  ๋•Œ๋Š” N-1 ๋กœ ๋‚˜๋ˆ„์–ด์•ผ ํ•œ๋‹ค.

์•ž์„œ ์šฐ๋ฆฌ๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ๊ณ„์‚ฐํ–ˆ๋˜ ๋ฐฉ์‹์€ ๋ชจ์ง‘๋‹จ์˜ ๋ถ„์‚ฐ์ด๋‹ค. 

๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ์ž์œ ๋„๋ฅผ 0์œผ๋กœ ์„ค์ •ํ•˜๋Š” ๊ฒฝ์šฐ, ๋™์ผํ•œ ๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 

 

๐Ÿ’ก Standard Deviation, ํ‘œ์ค€ํŽธ์ฐจ

ํ‘œ์ค€ํŽธ์ฐจ๋Š” ๋ถ„์‚ฐ์— ๋ฃจํŠธ(√)๋ฅผ ์”Œ์šด ๊ฐ’์ด๋‹ค.

๋ถ„์‚ฐ์„ ๊ตฌํ•  ๋•Œ, ์ œ๊ณฑ ๊ฐ’๋“ค์„ ๋”ํ•˜๋Š” ๊ณผ์ •์ด ์žˆ๋Š”๋ฐ ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํ‰๊ท ์— ๋น„ํ•ด์„œ ์Šค์ผ€์ผ์ด ์ปค์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

ํ‘œ์ค€ํŽธ์ฐจ๋Š” ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ œ๊ณฑ๋œ ์Šค์ผ€์ผ์„ ๋‚ฎ์ถ”๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

print(variance_df.v1.std(ddof = 0))
print(variance_df.v2.std(ddof = 0))

output :

 

๐Ÿ’ก Covariance, ๊ณต๋ถ„์‚ฐ

๊ณต๋ถ„์‚ฐ์ด๋ž€, 1๊ฐœ์˜ ๋ณ€์ˆ˜ ๊ฐ’์ด ๋ณ€ํ™”ํ•  ๋•Œ ๋‹ค๋ฅธ ๋ณ€์ˆ˜๊ฐ€ ์–ด๋– ํ•œ ์—ฐ๊ด€์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ ๋ณ€ํ•˜๋Š”์ง€๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

(ํ•˜๋‚˜๊ฐ€ ์ฆ๊ฐ€ํ•  ๋•Œ, ๋‹ค๋ฅธ ํ•˜๋‚˜๋„ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝํ–ฅ์„ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ.)

  • ์ฒซ๋ฒˆ์งธ ๊ทธ๋ž˜ํ”„์˜ ๊ฒฝ์šฐ, y์˜ ๊ฐ’์ด ๋†’์„ ๋•Œ x์˜ ๊ฐ’์€ ๋‚ฎ๋‹ค. ์ด๋Š” negative ๊ณต๋ถ„์‚ฐ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค ๋ผ๊ณ  ํ‘œํ˜„ํ•œ๋‹ค.
  • ๋‘๋ฒˆ์งธ ๊ทธ๋ž˜ํ”„์—์„œ๋Š” ๋‘ ๋ณ€์ˆ˜์˜ ๋†’๊ณ  ๋‚ฎ์Œ์— ๋Œ€ํ•˜์—ฌ ๊ด€๋ จ์„ฑ์„ ์•Œ ์ˆ˜ ์—†๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์ด๋Ÿฌํ•œ ํ˜•ํƒœ๋Š” 0์— ๊ฐ€๊นŒ์šด ๊ณต๋ถ„์‚ฐ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค.
  • ์„ธ๋ฒˆ์งธ ๊ทธ๋ž˜ํ”„์—์„œ y ๊ฐ’์ด ๋‚ฎ์„ ๋•Œ x์˜ ๊ฐ’๋„ ๋‚ฎ์œผ๋ฉฐ, ๋†’์„ ๋•Œ๋Š” ๊ฐ™์ด ๋†’์•„์ง„๋‹ค. ์ด ๊ฒฝ์šฐ ๋ณ€์ˆ˜๊ฐ„์˜ ๊ณต๋ถ„์‚ฐ ๊ฐ’์€ positive ๊ฐ’์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

ํฐ ๊ฐ’์˜ ๊ณต๋ถ„์‚ฐ์€ ๋‘ ๋ณ€์ˆ˜๊ฐ„์˜ ํฐ ์—ฐ๊ด€์„ฑ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ๋งŒ์•ฝ ๋ณ€์ˆ˜๋“ค์ด ๋‹ค๋ฅธ ์Šค์ผ€์ผ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด ๊ณต๋ถ„์‚ฐ์€ ์‹ค์ œ ๋ณ€์ˆ˜์˜ ์—ฐ๊ด€์„ฑ์— ๊ด€๊ณ„ ์—†์ด ์˜ํ–ฅ์„ ๋ฐ›๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค.

๋งŒ์•ฝ ๋‘ ๋ณ€์ˆ˜๊ฐ€ ์—ฐ๊ด€์„ฑ์ด ์ ๋”๋ผ๋„ ํฐ ์Šค์ผ€์ผ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด, ์—ฐ๊ด€์ด ๋†’์ง€๋งŒ ์Šค์ผ€์ผ์ด ์ž‘์€ ๋ณ€์ˆ˜๋“ค์— ๋น„ํ•ด์„œ ๋†’์€ ๊ณต๋ถ„์‚ฐ ๊ฐ’์„ ๊ฐ€์ง€๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค.

 

a = b = np.arange(5, 50, 5)
c = d = np.arange(10, 100, 10)

fake_data = {"a" : a, "b" : b, "c" : c, "d" : d}

df = pd.DataFrame(fake_data)

plt.scatter(df.a, df.b)
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.show()

plt.scatter(df.c, df.d)
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.show()

output :

 

๐Ÿ’ก Variance-covariance matirx, ๋ถ„์‚ฐ-๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ

df.cov()	# ๊ณต๋ถ„์‚ฐ ๊ณ„์‚ฐ

output :

์œ„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ๊ณต๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•œ ๊ฒฐ๊ณผ์ด๋‹ค. ์ด๋Ÿฌํ•œ matrix๋ฅผ variance-covariance matrix ๋ผ๊ณ  ํ‘œํ˜„ํ•˜๋ฉฐ, 

๋Œ€๊ฐ์„  ๋ถ€๋ถ„์€ ๊ณต๋ถ„์‚ฐ์ด ์•„๋‹Œ, ๋ถ„์‚ฐ์„ ํ‘œํ˜„ํ•œ๋‹ค.

 

 

๐Ÿ’ก Correlation coefficient

๋ถ„์‚ฐ์—์„œ ์Šค์ผ€์ผ์„ ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ์‚ฌ์šฉํ–ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ, ์ด๋ฒˆ์—๋„ ๊ณต๋ถ„์‚ฐ์˜ ์Šค์ผ€์ผ์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ณต๋ถ„์‚ฐ์„ ๋‘ ๋ณ€์ˆ˜์˜ ํ‘œ์ค€ํŽธ์ฐจ๋กœ ๊ฐ๊ฐ ๋‚˜๋ˆ ์ฃผ๋ฉด ์Šค์ผ€์ผ์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๊ณ , ์ด๋ฅผ ์ƒ๊ด€๊ณ„์ˆ˜ ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

์ƒ๊ด€๊ณ„์ˆ˜๋Š” -1์—์„œ 1๊นŒ์ง€๋กœ ์ •ํ•ด์ง„ ๋ฒ”์œ„ ์•ˆ์˜ ๊ฐ’๋งŒ์„ ๊ฐ€์ง€๋ฉฐ ์„ ํ˜•์—ฐ๊ด€์„ฑ์ด ์—†๋Š” ๊ฒฝ์šฐ 0์— ๊ทผ์ ‘ํ•˜๊ฒŒ ๋œ๋‹ค.

๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ ์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ ๊ณต๋ถ„์‚ฐ๋ณด๋‹ค ๋” ์ข‹์€ ์ง€ํ‘œ๋กœ์จ ์‚ฌ์šฉ๋˜๋ฉฐ ๊ทธ ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • ๊ณต๋ถ„์‚ฐ์€ ์ด๋ก ์ƒ ๋ชจ๋“  ๊ฐ’์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์ง€๋งŒ, ์ƒ๊ด€๊ณ„์ˆ˜๋Š” -1 ~ 1 ์‚ฌ์ด๋กœ ์ •ํ•ด์ ธ ๋น„๊ตํ•˜๊ธฐ๊ฐ€ ์‰ฝ๋‹ค.
  • ๊ณต๋ถ„์‚ฐ์€ ํ•ญ์ƒ ์Šค์ผ€์ผ, ๋‹จ์œ„๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์ง€๋งŒ, ์ƒ๊ด€๊ณ„์ˆ˜๋Š” ์ด์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋Š”๋‹ค.
  • ์ƒ๊ด€๊ณ„์ˆ˜๋Š” ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท  ํ˜น์€ ๋ถ„์‚ฐ์˜ ํฌ๊ธฐ์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋Š”๋‹ค.

 

์ƒ๊ด€๊ณ„์ˆ˜๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์†Œ๋ฌธ์ž r๋กœ ํ‘œํ˜„๋œ๋‹ค.

df.corr()	# ์ƒ๊ด€๊ณ„์ˆ˜ ๊ณ„์‚ฐ

์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ 1์ด๋ผ๋Š” ๊ฒƒ์€ ํ•œ ๋ณ€์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ ์™„๋ฒฝํ•œ ์–‘์˜ ์„ ํ˜•๊ด€๊ณ„๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

 

๐Ÿ”ฅ Spearman correlation

์œ„์—์„œ ๋ฐฐ์šด correlation coefficient๋Š” Pearson correlation ์ด๋ผ ๋ถ€๋ฅด๋ฉฐ ์ด๋Š” ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋ถ„์‚ฐ๊ณผ ๊ฐ™์€ ํ†ต๊ณ„์น˜๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์„ ๋•Œ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋งŒ์•ฝ ๋ฐ์ดํ„ฐ๊ฐ€ numeric์ด ์•„๋‹ˆ๋ผ categorical ์ด๋ผ๋ฉด Spearman correlation coefficient ๋ฅผ ์‚ฌ์šฉํ•ด์•ผํ•œ๋‹ค. Spearman correlation coefficient๋Š” ๊ฐ’๋“ค์— ๋Œ€ํ•ด์„œ ์ˆœ์„œ ํ˜น์€ rank๋ฅผ ๋งค๊ธฐ๊ณ , ๊ทธ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ correlation์„ ์ธก์ •ํ•˜๋Š” Non-parametricํ•œ ๋ฐฉ์‹์ด๋‹ค.

### Pearson correlation
import scipy.stats

r, p = scipy.stats.pearsonr(x, y)

r	# coefficient
>>> 0.7586402890911869

p	# pvalue
>>> 0.010964341301680829

np.corrcoef(x, y)
>>> array([[1.        , 0.75864029],
  	   [0.75864029, 1.        ]])
### Spearman correlation
import scipy.stats

result = scipy.stats.spearmanr(x, y)
result
>>> SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

result.correlation
>>> 0.9757575757575757

result.pvalue
>>> 1.4675461874042197e-06

 


๐Ÿ’ก Unit Vectors, ๋‹จ์œ„ ๋ฒกํ„ฐ

์„ ํ˜•๋Œ€์ˆ˜์—์„œ, ๋‹จ์œ„ ๊ธธ์ด(1)์„ ๊ฐ–๋Š” ๋ชจ๋“  ๋ฒกํ„ฐ๋ฅผ ๋งํ•œ๋‹ค.

 

๐Ÿ’ก Span

Span์ด๋ž€, ์ฃผ์–ด์ง„ ๋‘ ๋ฒกํ„ฐ์˜ (ํ•ฉ์ด๋‚˜ ์ฐจ์™€ ๊ฐ™์€) ์กฐํ•ฉ์œผ๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ๋ฒกํ„ฐ์˜ ์ง‘ํ•ฉ์ด๋‹ค.

 

1. ์„ ํ˜• ๊ด€๊ณ„์˜ ๋ฒกํ„ฐ (Linearly Dependent Vector)

๋งŒ์•ฝ ๋‘ ๋ฒกํ„ฐ๊ฐ€ ๊ฐ™์€ ์„ ์ƒ์— ์žˆ๋Š” ๊ฒฝ์šฐ, ์ด ๋ฒกํ„ฐ๋“ค์€ ์„ ํ˜• ๊ด€๊ณ„์— ์žˆ๋‹ค๊ณ  ํ‘œํ˜„ํ•œ๋‹ค.

์ฆ‰, ์ด ๋‘ ๋ฒกํ„ฐ๋“ค์€ ์กฐํ•ฉ์„ ํ†ตํ•ด์„œ ์„  ์™ธ๋ถ€์˜ ์ƒˆ๋กœ์šด ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์—†๋‹ค.

์ด๋Ÿฌํ•œ ๋ฒกํ„ฐ์˜ Span์€ ํ‰๋ฉด ๊ณต๊ฐ„์ด ์•„๋‹Œ, ๋ฒกํ„ฐ๊ฐ€ ์ด๋ฏธ ์˜ฌ๋ ค์ ธ ์žˆ๋Š” ์„ ์œผ๋กœ ์ œํ•œ๋œ๋‹ค.

 

2. ์„ ํ˜• ๊ด€๊ณ„๊ฐ€ ์—†๋Š” ๋ฒกํ„ฐ (Linearly Independent Vectors)

๋ฐ˜๋Œ€๋กœ ๊ฐ™์€ ์„ ์ƒ์— ์žˆ์ง€ ์•Š์€ ๋ฒกํ„ฐ๋“ค์€ ์„ ํ˜•์ ์œผ๋กœ ๋…๋ฆฝ๋˜์–ด ์žˆ๋‹ค๊ณ  ํ‘œํ˜„ํ•˜๋ฉฐ, ์ฃผ์–ด์ง„ ๊ณต๊ฐ„(2๊ฐœ์˜ ๋ฒกํ„ฐ์˜ ๊ฒฝ์šฐ R2 ํ‰๋ฉด)์˜ ๋ชจ๋“  ๋ฒกํ„ฐ๋ฅผ ์กฐํ•ฉ์„ ํ†ตํ•ด ๋งŒ๋“ค์–ด ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

 

 

โœจ Rank

  • ๋งคํŠธ๋ฆญ์Šค์˜ rank๋ž€, ๋งคํŠธ๋ฆญ์Šค์˜ ์—ด์„ ์ด๋ฃจ๊ณ  ์žˆ๋Š” ๋ฒกํ„ฐ๋“ค๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” (Span) ๊ณต๊ฐ„์˜ ์ฐจ์›์ด๋‹ค.
  • ๋งคํŠธ๋ฆญ์Šค์˜ ์ฐจ์›๊ณผ๋Š” ๋‹ค๋ฅผ ์ˆ˜๋„ ์žˆ์œผ๋ฉฐ ๊ทธ ์ด์œ ๋Š” ํ–‰๊ณผ ์—ด์„ ์ด๋ฃจ๊ณ  ์žˆ๋Š” ๋ฒกํ„ฐ๋“ค ๊ฐ€์šด๋ฐ ์„œ๋กœ ์„ ํ˜• ๊ด€๊ณ„๊ฐ€ ์žˆ์„ ์ˆ˜๋„ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
  • Rank๋ฅผ ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ๊ฐ€์ง€๊ฐ€ ์žˆ์ง€๋งŒ, ๊ทธ ์ค‘ ํ•˜๋‚˜์ธ Gaussian Elimination์„ ํ†ตํ•ด ์•Œ์•„๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

๐Ÿ’ก Gaussian Elimiantion

Gaussian Elimination์€ ์ฃผ์–ด์ง„ ๋งคํŠธ๋ฆญ์Šค๋ฅผ "Row Echelon form"์œผ๋กœ ๋ฐ”๊พธ๋Š” ๊ณ„์‚ฐ๊ณผ์ •์ด๋‹ค.

Row-Echelon form ์ด๋ž€, ๊ฐ ํ–‰์— ๋Œ€ํ•ด์„œ ์™ผ์ชฝ์— 1, ๊ทธ ์ดํ›„ ๋ถ€๋ถ„์€ 0์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ํ˜•ํƒœ์ด๋‹ค.

์ด๋Ÿฌํ•œ ๋งคํŠธ๋ฆญ์Šค๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ upper-triangular์˜ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

๋งˆ์ง€๋ง‰ ์ค„์ด [0, 0, 0] ์ด ๋˜๋Š” ๊ฒƒ์€ 3๊ฐœ์˜ ํ–‰์ด ์„ ํ˜• ๊ด€๊ณ„๋ฅผ ๊ฐ–๋Š”๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.

๋”ฐ๋ผ์„œ ์ฒ˜์Œ ์ฃผ์–ด์กŒ๋˜ ๋งคํŠธ๋ฆญ์Šค์˜ Rank๋Š” 2์ด๋ฉฐ, ์ด๋Š” 3x3 ๋งคํŠธ๋ฆญ์Šค์ด์ง€๋งŒ R3 ๊ณต๊ฐ„์ด ์•„๋‹Œ R2๋งŒ์„ ๋ฒกํ„ฐ๋“ค๋กœ ๋งŒ๋“ค์–ด ๋‚ผ ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค.

 

Row Echelon form (ํ–‰ ์‚ฌ๋‹ค๋ฆฌ๊ผด ํ˜•์‹) ์˜ ์กฐ๊ฑด

all zeros row : ๋ชจ๋“  ๊ฐ’์ด 0์ธ ํ–‰     ex) [0, 0, 0]
nonzero row : 1๊ฐœ๋ผ๋„ 0์ด ์•„๋‹Œ ๊ฐ’์„ ๊ฐ–๊ณ  ์žˆ๋Š” ํ–‰     ex) [0, 1, 2]
leading entry : nonzero row์—์„œ ๊ฐ€์žฅ ์™ผ์ชฝ์— ์œ„์น˜ํ•œ nonzero entry     ex) [0, 1, 2]์—์„œ์˜ leading entry๋Š” 1
  1. ๋ชจ๋“  nonzero row๋Š” all zeros row ๋ณด๋‹ค ์œ„์ชฝ์— ์œ„์น˜ํ•ด์•ผํ•œ๋‹ค.
  2. ๊ฐ ํ–‰์˜ leading entry ๋Š” ์ž์‹ ๋ณด๋‹ค ์œ„์ชฝ์— ์œ„์น˜ํ•œ ํ–‰์˜ leading entry ๋ณด๋‹ค ์˜ค๋ฅธ์ชฝ์— ์กด์žฌํ•œ๋‹ค.
  3. leading entry๋Š” ๋ฐ˜๋“œ์‹œ 1 ์ด์–ด์•ผํ•œ๋‹ค.

reference.

https://www.youtube.com/watch?v=2GKESu5atVQ 

 

'Boot Camp > section1' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[AIB] Clustering (+ PCA ๊ฐœ๋…)  (0) 2022.02.09
[AIB] High dimensional Data  (0) 2022.01.06
[AIB] Vector/Matrices  (0) 2021.12.20
[AIB] Bayesian  (0) 2021.11.02
[AIB] Confidence Intervals  (0) 2021.11.02