[OSS] Python Meets Mathematics

▼ Calculus (미적분)
SciPy
Matpolotlib.pyplot
Differentiation (미분)
Integration (적분)
▼ Linear Algebra
Vector
Matrix
NumPy
▼ Optimization
Optimization
Nonlinear Optimization
scipy.optimize
Line fitting with minimizing Algebraic and Geometric distance
▼ Probablity

▼ Calculus (미적분)

SciPy

파이썬 기반의 오픈소스 계산 툴이다.

Matpolotlib.pyplot

Equation

import matplotlib.pyplot as plt

xs = [x for x in range(-4, 10)]
ys = [0.1*x**3 - 0.8*x**2 - 1.5*x + 5.4 for x in xs]

plt.plot(xs, ys, 'r-')
plt.show()

plot([x], y, [fmt], …)
: 이 함수는 x에 대한 y를 선과/또는 마커로 표시. 'fmt' 문자열을 통해 색상, 마커 스타일, 선 유형 등의 기본 형식을 정의 가능.
hist(x, bins=None, range=None, …)
: 이 함수는 히스토그램을 그려낸다.. 배열 x를 받아 데이터를 구간별로 나눕니다. 구간의 수, 범위 등 다양한 속성을 지정 가능.
contour([X, Y,] Z, [levels], …)
: 이 함수는 Z 데이터의 등고선을 그립니다. 3차원 데이터를 2차원 플롯에 그리는 데 유용하다.
( X와 Y는 격자를 정의 ! )
imshow(X, cmap=None, …)
: 이 함수는 데이터를 2차원 정규 래스터(raster) 형태의 이미지로 표시하고, cmap을 통해 색상 맵을 설정 가능.
( raster: 많은 양의 픽셀(Pixel)이 모여 하나의 이미지를 구성하는 방식 )
title(label, …)
: 이 함수는 축(axis)에 제목을 설정.
axis(…)
: 이 함수는 축 속성(예: 'on'/'off', 'equal', 축 범위 등)을 가져오거나 설정.
legend(…)
: 이 함수는 축에 legend(선들을 구분해주기 위한 이름)를 배치.
grid(…)
: 이 함수는 grid(격자선)을 설정.
xlabel(label, …), xlim(left, right), …
: 이 함수들은 x축의 label과 limit를 설정.
figure(num=None, …)
: 이 함수는 새로운 그림을 생성하거나 기존 그림을 활성화.
show(…)
: 이 함수는 모든 열린 그림을 표시.
savefig(filename, …)
: 이 함수는 현재 그림을 파일로 저장.

Equation

import matplotlib.pyplot as plt

scale = 10
xs = [x/scale for x in range(-4*scale, 10*scale)]
ys = [0.1*x**3 - 0.8*x**2 - 1.5*x + 5.4 for x in xs]

plt.title('$y = 0.1x^3 - 0.8x^2 - 1.5x + 5.4$') # $ -> LaTeX style
plt.plot(xs, ys, 'r-')
plt.xlabel('x')
plt.ylabel('y')
plt.grid()
plt.axis('equal')
plt.show()

Differentiation (미분)

주어진 함수의 도함수(기울기) 혹은 선형 근사(linear approximation)를 찾는 과정

import matplotlib.pyplot as plt
scale = 10
xs = [x/scale for x in range(-4*scale, 10*scale)]
ys = [0.1*x**3 - 0.8*x**2 - 1.5*x + 5.4 for x in xs]
yt = [-3.5*x + 7 for x in xs]
plt.plot(xs, ys, 'r-', label='y')
plt.plot(xs, yt, 'b--', label='tangent line at x=2')
plt.xlabel('x')
plt.ylabel('y')
plt.grid()
plt.legend()
plt.show()

SymPy - 도함수 구하기

import sympy as sp

x, y = sp.symbols('x y')
y = 0.1*x**3 - 0.8*x**2 - 1.5*x + 5.4

yd = sp.diff(y, x)	// dy를 dx로 나누기 (미분)
print(yd) # 0.3*x**2 - 1.6*x - 1.5
print(float(yd.subs({x: 2}))) # -3.5 / x를 2로 치환하고 float타입으로 출력

SymPy - y가 0일 때 x 찾기

roots = sp.solveset(y, x)	# y: 함수, x: 구하고 싶은 미지수
print(roots) 				# FiniteSet(-3.0, 2.0, 9.0)
r0 = float(roots.args[0]) 	# -3.0 / Casting from Float object to float

인수 분해 (Factorize)

sp.pprint(sp.factor(y)) # 5.4·(0.11...·x - 1.0)·(0.33...·x + 1.0)·(0.5·x - 1.0)

y = (x**3 - 8*x**2 - 15*x + 54)/10 # 분수 형태로 변환
sp.pprint(sp.factor(y)) # (x - 9)·(x - 2)·(x + 3)
						# ───────────────────────
						# 			10
print(sp.factor(y))		# (x - 9)*(x - 2)*(x + 3)/10

함수가 연속인 경우엔 아래처럼 기울기를 쉽게 구해볼 수 있다.

y = lambda x: 0.1*x**3 - 0.8*x**2 - 1.5*x + 5.4
x0 = 2
h = 0.001

d1 = (y(x0+h) - y(x0)) / h # -3.5002
d2 = (y(x0+h/2) - y(x0-h/2)) / h # -3.5000

Integration (적분)

잘 사용 안한다 !

▼ Linear Algebra

Vector

(Physical quantities) 속도와 방향을 모두 표현하기 위해 사용 !
A point [1, 4] , a line 2x + 3y − 14 = 0 → [2, 3, −14] , and its normal vector [2, 3] or [0.554 … , 8.832 …] 로 표현하기 위해 !

Matrix

관성 모멘트(Moment of inertia) 같은 physical quantities를 표현하기 위해 사용 !
데이터와 모델(model)을 표현하기 위해 !
ex) Images, multiple vectors
geometric tranformation을 표현하기 위해 !
System of linear equations(선형연립방정식)을 풀기 위해서 !

Matrix multiplication
➜ Not commutative (교환법칙 성립 X)

import sympy as sp

a11, a12, a21, a22 = sp.symbols('a11 a12 a21 a22')
b11, b12, b21, b22 = sp.symbols('b11 b12 b21 b22')
A = sp.Matrix([[a11, a12], [a21, a22]])
B = sp.Matrix([[b11, b12], [b21, b22]])

print(A+B == B+A) # True
print(A*B == B*A) # False
print(A*B - B*A) # Check their difference

Matrix inverse (AA^-1 = A-1A = I; 역수)
Pseudo-inverse (기존 Matrix inverse를 확장한 개념)
➜ 반드시 square matrix일 필요 X

NumPy

다차원 배열과 행렬을 계산할 수 있는 함수를 제공하는 중요한 라이브러리이다 !
numpy.array
➜ homogenous(~same) data type을 가진다. (vs. list and tuple)

import numpy as np

# 1. Create an array from a composite data (list or tuple, not set and dictionary)
A = np.array([3, 29, 82])
B = np.array(((3., 29, 82), (10, 18, 84)))
C = np.array([[3, 29, 82], [10, 18, 84]], dtype=float)
D = np.array([3, 29, 'Shin'])
E = np.array([[3], [29], [82]])
# nidm: 몇차원, size: 크기, shape: n x m, dtype: 데이터 타입
print(A.ndim, A.size, A.shape, A.dtype) # 1 3 (3,) int32
										# Note) np.array_equal(A, A.T) == True
                                        # 1차원 배열은 전치시켜도 같다.
print(B.ndim, B.size, B.shape, B.dtype) # 2 6 (2, 3) float64
print(C.ndim, C.size, C.shape, C.dtype) # 2 6 (2, 3) float64
print(D.ndim, D.size, D.shape, D.dtype) # 1 3 (3,) <U11 (Unicode)
										# Note) array(['3', '29', 'Shin'])
print(E.ndim, E.size, E.shape, E.dtype) # 2 3 (3,1) int32
										# Note) np.array_equal(E, E.T) == False
										# because E.T == array([[3, 29, 82]])
                                        
# 2. Create an array using initializers
F = np.zeros((3, 2)) 			# Create a 3x2 array filled with 0 (default: float64)
G = np.ones((3, 2)) 			# Create a 2x3 array filled with 1
H = np.eye(3, dtype=np.float32) # Create a 3x3 identity matrix (single-precision)
I = np.empty((3, 2)) 			# == np.zeros((3, 2))
J = np.empty((0, 9)) 			# [ ] with size of (0, 9)
K = np.arange(0, 1, 0.2) 		# 간격으로 쪼갠다 - Step 0.2: array([0., 0.2, 0.4, 0.6, 0.8])
L = np.linspace(0, 1, 5) 		# 개수로 쪼갠다 - Number 5: array([0., 0.25, 0.50, 0.75, 1.])
M = np.random.random((3, 2)) 	# == np.random.uniform(size=(3, 2))
								# Note) np.random.normal()

Indexing & Slicing

import numpy as np
A = np.array(((3., 29, 82), (10, 18, 84)))

# 1. Indexing and slicing
A[1][1] 	# 18.0
A[1, 1] 	# 18.0 Note) list[1, 1] does not work!
A[1,1:2] 	# array([18.])
A[1,:] 		# Get a row: array([10., 18., 84.])
A[:,2] 		# Get a column: array([82., 84.])
A[0:2,0:2] 	# Get a submatrix: array([[3., 29.], [10., 18.]])

# 2. Logical indexing
A > 80 			# array([[False, False, True], [False, False, True]])
A[A > 80] 		# array([82., 84.])
A[A > 80] = 80 	# Masked operations are possible!

# 3. Fancy indexing
A[(1, 0, 0), (1, 0, 2)] # array([18., 3., 82.])
						# Get items at (1, 1), (0, 0), (0, 2)

Arithmetic operations (cf. different form list)

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# 1. Element-wise arithmetic operations
print(A + B) # np.add(A, B)
print(A - B) # np.subtract(A, B)
print(A * B) # np.multiply(A, B)
print(A / B) # np.divide(A, B)

# 2. Matrix operations
print(A.T) 						# A.transpose()
print(A @ B) 					# np.matmul(A, B) : Matrix multiplication
print(np.linalg.norm(A)) 		# 5.48 (default: L2-norm)
print(np.linalg.matrix_rank(A)) # 2, full rank
print(np.linalg.det(A)) 		# -2, non-zero determinant
print(np.linalg.inv(A)) 		# Matrix inverse
print(np.linalg.pinv(A)) 		# Matrix pseudo-inverse Note) inv()의 기능을 포함하고 있다.

# 3. Broadcasting
print(A + 1) # [[2, 3], [4, 5]]
print(A + [0, -1]) # [[1, 1], [3, 3]]
print(A + [[1], [-1]]) # [[2, 3], [2, 3]]

Broatcasting 개념

Example - Line fitting from two points, (1, 4) and (4, 2)

import numpy as np

A = np.array([[1., 1.], [4., 1.]])
b = np.array([[4.], [2.]])
A_inv = np.linalg.inv(A)
print(A_inv * b) 	# [[-1.33333333 1.33333333]
					# [ 2.66666667 -0.66666667]] Note) broadcast
print(A_inv @ b) 	# [[-0.66666667]
					# [ 4.66666667]]

Example - Line fitting from more than two points such as (1, 4), (4, 2) and (7,1)

import numpy as np

A = np.array([[1., 1.], [4., 1.], [7., 1.]])
b = np.array([[4.], [2.], [1.]])
A_inv = np.linalg.pinv(A) # Left inverse
print(A_inv @ b) 	# [[-0.5]
					# [ 4.33333333]]

Example - Line fitting with noisy but more data
➜ 데이터가 많을수록 'noise' 에 강해진다.(robust)
➜ 'Line fitting' 이 잘되는 것을 확인 가능 !

Curve fitting (≒ Line fitting)

import numpy as np
import matplotlib.pyplot as plt

true_curve = lambda x: 0.1*x**3 - 0.8*x**2 - 1.5*x + 5.4
data_range = (-6, 12)
data_num = 100
noise_std = 0.5

# Generate the true data
x = np.random.uniform(data_range[0], data_range[1], size=data_num)
y = true_curve(x)

# Add Gaussian noise
xn = x + np.random.normal(scale=noise_std, size=x.shape)
yn = y + np.random.normal(scale=noise_std, size=y.shape)

# Solve the system of equations
A = np.vstack((xn**3, xn**2, xn, np.ones(xn.shape))).T
b = yn
curve = np.linalg.pinv(A) @ b

# Plot the data and result
plt.title(f'Curve: y={curve[0]:.3f}*$x^3$ + {curve[1]:.3f}*$x^2$ + {curve[2]:.3f}*$x$ + {curve[3]:.3f}')
xc = np.linspace(*data_range, 100)
plt.plot(xc, true_curve(xc), 'r-', label='The true curve')
plt.plot(xn, yn, 'b.', label='Noisy data')
plt.plot(xc, curve[0]*xc**3 + curve[1]*xc**2 + curve[2]*xc + curve[3], 'g-', label='Estimate')
plt.xlim(data_range)
plt.legend()
plt.show()

Curve fitting + Model selection

import numpy as np
import matplotlib.pyplot as plt

# 모델 A를 만들어주는 기능을 추가해준 것 뿐이다.
def buildA(order, xs):
	A = np.empty((0, len(xs)))
	for i in range(order + 1):
		A = np.vstack((xs**i, A))
	return A.T
    
true_coeff = [0.1, -0.8, -1.5, 5.4]
poly_order = 3 # Try other integer (>= 0)
data_range = (-6, 12)
data_num = 100
noise_std = 1

# Generate the true data
x = np.random.uniform(data_range[0], data_range[1], size=data_num)
y = buildA(len(true_coeff) - 1, x) @ true_coeff

# Add Gaussian noise
xn = x + np.random.normal(scale=noise_std, size=x.shape)
yn = y + np.random.normal(scale=noise_std, size=y.shape)

# Solve the system of equations
A = buildA(poly_order, xn)
b = yn
coeff = np.linalg.pinv(A) @ b

# Plot the data and result
plt.title(f'Order: {poly_order}, Coeff: ' + np.array2string(coeff, precision=2, suppress_small=True))
xc = np.linspace(*data_range, 100)
plt.plot(xc, np.matmul(buildA(len(true_coeff) - 1, xc), true_coeff), 'k-', label='The true curve', alpha=0.2)
plt.plot(xn, yn, 'b.', label='Noisy data')
plt.plot(xc, np.matmul(buildA(poly_order, xc), coeff), 'g-', label='Estimate')
plt.xlim(data_range)
plt.legend()
plt.show()

inverse가 안되는 경우
➜ 'inv', 'pinv' 모두 오류 발생 !
(해결 방법)
➜ 아래에서 v로 구성된 것을 "Null space" 라 하는데, 이 'Null' 벡터를 찾으면 된다.

import numpy as np

A = np.array([[1., 1.], [1., 1.]])
b = np.array([[4.], [2.]])
A_inv = np.linalg.inv(A) # Error! (singular matrix)
print(A_inv @ b)

# Null space를 찾는 방법
import numpy as np
from scipy import linalg

A = np.array([[1., 4., 1.], [1., 2., 1.]])
x = linalg.null_space(A)
print(x / x[0]) # [[1.], [0.], [-1.]] Note) Line: x – 1 = 0

▼ Optimization

Geometric distance

Optimization

'Objective functioin' 을 최대/최소화(minimization)시켜서 지표(criterion)상 가장 적합한(best) 것을 선택하는 것.
➜ 예를 들어 loss function을 어떤 것을 선택하는지에 따라 성능이 매우 달라질 수 있다 !

Nonlinear Optimization

Gradient descent
➜ 기울기가 감소하는 방향으로 !
➜ 미분 가능한 함수의 local minimum(극소값)을 찾기 위해 1차 편미분을 반복적으로 하며 update하는 방법.

import numpy as np
import matplotlib.pyplot as plt

f = lambda x: 0.1*x**3 - 0.8*x**2 - 1.5*x + 5.4
fd = lambda x: 0.3*x**2 - 1.6*x - 1.5
viz_range = np.array([-6, 12])
learn_rate = 0.1 # step size, Try 0.001, 0.01, 0.5, and 0.6 : 
max_iter = 100
min_tol = 1e-6
x_init = 12 # 시작점, Try -1 -> 음의 무한대로 간다.

# Prepare visualization
xs = np.linspace(*viz_range, 100)
plt.plot(xs, f(xs), 'r-', label='f(x)', linewidth=2)
plt.plot(x_init, f(x_init), 'b.', label='Each step', markersize=12)
plt.axis((*viz_range, *f(viz_range)))	// axis : xlim과 ylim을 한 번에 설정
plt.legend()

x = x_init
for i in range(max_iter):
	# Run the gradient descent
	xp = x
	x = x - learn_rate*fd(x)	# x 값을 조금씩 감소시키면서 극소(local minimum)을 찾는다

	# Update visualization for each iteration
	print(f'Iter: {i}, x = {xp:.3f} to {x:.3f}, f(x) = {f(xp):.3f} to {f(x):.3f} (f\'(x) = {fd(xp):.3f})')
	lcolor = np.random.rand(3)
	approx = fd(xp)*(xs-xp) + f(xp)
	plt.plot(xs, approx, '-', linewidth=1, color=lcolor, alpha=0.5)
	plt.plot(x, f(x), '.', color=lcolor, markersize=12)
	
    # Check the terminal condition
	if abs(x - xp) < min_tol:
		break
plt.show()

Newton's method
➜ 미분 가능한 함수의 극값(최대값 또는 최소값) 또는 해를 찾는데 사용되는 기법.
➜ 'step size' 필요 X, 'step size' 가 이미 정해져있다.
➜ 1/fdd(x)

step size

import numpy as np
import matplotlib.pyplot as plt

f = lambda x: 0.1*x**3 - 0.8*x**2 - 1.5*x + 5.4
fd = lambda x: 0.3*x**2 - 1.6*x - 1.5
fdd = lambda x: 0.6*x - 1.6
viz_range = np.array([-6, 12])
max_iter = 100
min_tol = 1e-6
x_init = 12 # Try -2, 0, and 16/6 (a saddle point)

# Prepare visualization
xs = np.linspace(*viz_range, 100)
plt.plot(xs, f(xs), 'r-', label='f(x)', linewidth=2)
plt.plot(x_init, f(x_init), 'b.', label='Each step', markersize=12)
plt.axis((*viz_range, *f(viz_range)))
plt.legend()

x = x_init
for i in range(max_iter):
	# Run the Newton method
	xp = x
	x = x - fd(x) / fdd(x) 
    # the maxima and saddle point problems를 해결하기 위한 방법
    # ➜ Replace the denominator as abs(fdd(x)) and (abs(fdd(x)) + 1)
	
    # Update visualization for each iteration
	print(f'Iter: {i}, x = {xp:.3f} to {x:.3f}, f(x) = {f(xp):.3f} to {f(x):.3f} (f\'(x) = {fd(xp):.3f}, f\'\'(x) = {fdd(xp):.3f})')
	lcolor = np.random.rand(3)
	approx = 0.5*fdd(xp)*(xs-xp)**2 + fd(xp)*(xs-xp) + f(xp)
	plt.plot(xs, approx, '-', linewidth=1, color=lcolor, alpha=0.8)
	plt.plot(x, f(x), '.', color=lcolor, markersize=12)

	# Check the terminal condition
	if abs(x - xp) < min_tol:
		break
plt.show()

scipy.optimize

Optimization 문제나 Root finding에 대한 함수를 제공.
Minimization(최소화) ➜ 도함수를 구해줄 필요가 없다! ( 함수가 알아서 해주기 때문 )
➜ 파라미터 값을 조정해줄 수 있다. (tol, options)

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

f = lambda x: 0.1*x**3 - 0.8*x**2 - 1.5*x + 5.4
viz_range = np.array([-6, 12])
max_iter = 100
min_tol = 1e-6
x_init = 12 # Try -2, 0, and 16/6

# Find the minimum by SciPy
result = minimize(f, x_init, tol=min_tol, options={'maxiter': max_iter, 'return_all': True})
print(result)

# Visualize all iterations
xs = np.linspace(*viz_range, 100)
plt.plot(xs, f(xs), 'r-', label='f(x)', linewidth=2)
xr = np.vstack(result.allvecs)
plt.plot(xr, f(xr), 'b.', label='Each step', markersize=12)
plt.legend()
plt.axis((*viz_range, *f(viz_range)))
plt.show()

Line fitting with minimizing Algebraic and Geometric distance

Algebraic distance

Geometric distance는 선이 좀 더 가파른 기울기를 가질 때일수록 더 유용하게 쓰인다.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

true_line = lambda x: -14/3*x + 14/3
data_range = np.array([-4, 12])
data_num = 100
noise_std = 1

# Generate the true data
x = np.random.uniform(data_range[0], data_range[1], size=data_num)
y = true_line(x)

# Add Gaussian noise
xn = x + np.random.normal(scale=noise_std, size=x.shape)
yn = y + np.random.normal(scale=noise_std, size=y.shape)

# Find a line minimizing algebraic distance
A = np.vstack((xn, np.ones(xn.shape))).T
b = yn
l_alg = np.linalg.pinv(A) @ b
e_alg = np.mean(np.abs(l_alg[0]*xn - yn + l_alg[1]) / np.sqrt(l_alg[0]**2 + 1))

# Find a line minimizing geometric distance
geo_dist2 = lambda x: np.sum((x[0]*xn - yn + x[1])**2) / (x[0]**2 + 1)
result = minimize(geo_dist2, [-1, 0]) # The initial value: y = -x
l_geo = result.x
e_geo = np.mean(np.abs(l_geo[0]*xn - yn + l_geo[1]) / np.sqrt(l_geo[0]**2 + 1))

# Plot the data and result
plt.plot(data_range, true_line(data_range), 'r-', label='The true line')
plt.plot(xn, yn, 'b.', label='Noisy data')
plt.plot(data_range, l_alg[0]*data_range + l_alg[1], 'g-', label=f'Solve Ax=b (GeoError: {e_alg:.3f})')
plt.plot(data_range, l_geo[0]*data_range + l_geo[1], 'm-', label=f'Optimization (GeoError: {e_geo:.3f})')
plt.legend()
plt.xlim(data_range)
plt.show()

▼ Probablity

강의 자료를 참고하는 것이 좋을 것 같다.

math04_probability.pdf

2.66MB

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[OSS] Python Meets Mathematics

▼ Calculus (미적분)

SciPy

Matpolotlib.pyplot

Differentiation (미분)

Integration (적분)

▼ Linear Algebra

Vector

Matrix

NumPy

▼ Optimization

Optimization

Nonlinear Optimization

scipy.optimize

Line fitting with minimizing Algebraic and Geometric distance

▼ Probablity

▼ Calculus (미적분)

SciPy

Matpolotlib.pyplot

Differentiation (미분)

Integration (적분)

▼ Linear Algebra

Vector

Matrix

NumPy

▼ Optimization

Optimization

Nonlinear Optimization

scipy.optimize

Line fitting with minimizing Algebraic and Geometric distance

▼ Probablity

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역