Coding/Coursera

[Andrew Ng] Neural Network and Deep Learning : 2. Basics of Neural Network Programming (1)

폴밴 2021. 10. 6. 13:45

Basics of Neural Network Programming

Binary Classification

바이너리(0 또는 1)로 분류하는 것을 의미한다.

예를 들어 고양이의 이미지를 보고 고양이인지(1) 아닌지(2)를 판단해 라벨을 다는 것은 binary classification이다.

Notation

$encoding="application/x-tex">n_x</annotation></semantics></math>$ features : $x$ , output $y$

$\in \reals^{n_x} , \ y \in {0,1}</annotation></semantics></math>$

m개의 training example : $encoding="application/x-tex">{(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)})}</annotation></semantics></math>$

X = $\begin{bmatrix} x^{(1)} \ x^{(2)} \ ... \ x^{(m)} \end{bmatrix}$
\
X \cdot shape = (n_x,m)

Feature Vector $X$ 는 $encoding="application/x-tex">n_x \times m</annotation></semantics></math>$ matrix이다.

다른 표기에서는 $X$ 를 transpose하여 만드는 경우도 있지만, 코드 작성의 편의를 위해 위와 같이 표기한다.

Y = $\begin{bmatrix} y^{(1)} \ y^{(2)} \ ... \ y^{(m)} \end{bmatrix}$
\
Y \cdot shape = (1,m)

Logistic Regression

$x$ 가 주어졌을 때, $accent="true"><mi>y</mi><mo>^</mo></mover><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mo>=</mo><mn>1</mn><mi mathvariant="normal">∣</mi><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\hat{y} = P(y=1|x)</annotation></semantics></math>$ ( $y = 1$ 일 확률)을 구한다.

파라미터 $\in \reals</annotation></semantics></math>$
$\in \reals^{n_x}</annotation></semantics></math>$
출력
$accent="true"><mi>y</mi><mo>^</mo></mover><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><msup><mi>w</mi><mi>T</mi></msup><mi>x</mi><mo>+</mo><mi>b</mi><mo stretchy="false">)</mo><mtext> </mtext><mo>=</mo><mi>σ</mi><mo stretchy="false">(</mo><mi>z</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>+</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>z</mi></mrow></msup></mrow></mfrac></mrow><annotation encoding="application/x-tex">\hat{y} = \sigma(w^Tx+b)\ \ \ \ =\sigma(z)= \frac 1 {1 + e^{-z}}</annotation></semantics></math>$

Logistic Regression cost function

Training set $encoding="application/x-tex">(x^{(i)},y^{(i)})</annotation></semantics></math>$ 이 주어졌을 때,

알고리즘의 $y = 1$ 일 확률이 실제와 비슷해지기를 원한다. ( $accent="true"><mi>y</mi><mo>^</mo></mover><mrow><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo>≈</mo><msup><mi>y</mi><mrow><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup></mrow><annotation encoding="application/x-tex">\hat{y}^{(i)} \approx y^{(i)}</annotation></semantics></math>$ )

Loss (error) Function :오차 함수가 최대한 작게 만들기를 원한다.
$accent="true"><mi>y</mi><mo>^</mo></mover><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><mi>y</mi><mo stretchy="false">)</mo><mi>log</mi><mo>⁡</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><mover accent="true"><mi>y</mi><mo>^</mo></mover><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">-(y \log \hat{y} +(1-y)\log(1-\hat{y}))</annotation></semantics></math>$

Cost Function파라미터 $w, b$ 를 조절해서 cost function $J$ 가 최소가 되도록 하는 $w, b$ 를 찾는다.
$accent="true"><mi>y</mi><mo>^</mo></mover><mrow><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><msup><mi>y</mi><mrow><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo stretchy="false">)</mo><mi>log</mi><mo>⁡</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><msup><mover accent="true"><mi>y</mi><mo>^</mo></mover><mrow><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo stretchy="false">)</mo><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">J(w,b) = \frac 1 m \sum^m_{i=1}[y^{(i)} \log \hat{y}^{(i)}+(1-y^{(i)})\log(1-\hat{y}^{(i)})]</annotation></semantics></math>$

Gradient Descent

이를 통해 cost function의 global optimum을 찾고 이때의 $w, b$ 를 구한다.

수렴할 때까지 다음 식을 반복한다.

$\alpha \frac {\partial J(w,b)} {\partial w}</annotation></semantics></math>$

$\alpha \frac {\partial J(w,b)} {\partial b}</annotation></semantics></math>$

Derivatives

More derivatives examples

Computation Graph

변수의 관계와 흐름을 파악할 수 있는 그래프이다.

오른쪽으로 가면 어떤 변수가 어디에 들어가는지, 왼쪽으로 가면 미분할 때 유용하게 사용할 수 있다.

Derivatives with a Computation Graph

$\ v=a+u, \ J=3v</annotation></semantics></math>$ 일 때

$J = 3 v$ 이므로 $encoding="application/x-tex">\therefore \frac {dJ} {dv} = 3</annotation></semantics></math>$
$\to 11.001</annotation></semantics></math>$ 이 되면, $\to 33.003</annotation></semantics></math>$ 이 된다.

$v = a + u$ 이므로연쇄법칙 (chain rule)에 의해서, $encoding="application/x-tex">\therefore \frac {dJ} {db} = \frac {dJ} {du} \cdot \frac {du} {db} = 3 \times c</annotation></semantics></math>$
$encoding="application/x-tex">\therefore \frac {dJ} {dc} = \frac {dJ} {du} \cdot \frac {du} {dc} = 3 \times b</annotation></semantics></math>$
$encoding="application/x-tex">\therefore \frac {dJ} {da} = \frac {dJ} {dv} \cdot \frac {dv} {da} = 3 \times 1</annotation></semantics></math>$
$\to 5.001, \ v=11 \to 11.001, \ J=33 \to 33.003</annotation></semantics></math>$

또한, 이 강의에서 $encoding="application/x-tex">\frac {d[FindOutputVar]} {d[var]}=d[var]</annotation></semantics></math>$ 로 약속하고 사용한다.

Logistic Regression - Gradient Descent

Logistic Regression recap

Logistic Regression derivatives

$encoding="application/x-tex">[da]=\frac {dL(a,y)} {da}=-\frac y a + \frac {1-y} {1-a}</annotation></semantics></math>$

$encoding="application/x-tex">[dz]=\frac {dL(a,y)} {dz}=a-y</annotation></semantics></math>$

Gradient descent on m examples

J(w,b)=\frac 1 m \sum^m_{i=1}L(a^{(i)},y^{(i)})
\
\to a^{(i)}=\hat{y}^{(i)}=\sigma(z^{(i)})=\sigma(w^Tx^{(i)}+b)

$encoding="application/x-tex">\frac \partial {\partial w_1}J(w,b)=\frac 1 m \sum^m_{i=1}\frac \partial {\partial w_1}L(a^{(i)},y^{(i)})</annotation></semantics></math>$

Initialize values : $\ dw_1=0, \ dw_2=0, \ db = 0</annotation></semantics></math>$

For $\ to \ m</annotation></semantics></math>$ :

$encoding="application/x-tex">z^{(i)}=w^Tx^{(i)}+b</annotation></semantics></math>$

$encoding="application/x-tex">a^{(i)}=\sigma(z^{(i)})</annotation></semantics></math>$

$\larr J + -[y^{(i)}\log a^{(i)} + (1-y^{(i)})\log (1-a^{(i)})]</annotation></semantics></math>$

$encoding="application/x-tex">dz^{(i)}=a^{(i)}-y^{(i)}</annotation></semantics></math>$

$encoding="application/x-tex">dw_1 \larr dw_1 + x_1^{(i)}dz^{(i)}</annotation></semantics></math>$

$encoding="application/x-tex">dw_2 \larr dw_2 + x_2^{(i)}dz^{(i)}</annotation></semantics></math>$

$encoding="application/x-tex">db\larr db + dz^{(i)}</annotation></semantics></math>$

Compute Average:

$\ dw_1/m, \ dw_2/m, \ db/m</annotation></semantics></math>$

$encoding="application/x-tex">dw_1=\frac {\partial J} {\partial w_1}</annotation></semantics></math>$

For

$encoding="application/x-tex">w_1 :=w_1-\alpha dw_1</annotation></semantics></math>$

$encoding="application/x-tex">w_2 :=w_2-\alpha dw_2</annotation></semantics></math>$

$:=b-\alpha db</annotation></semantics></math>$

for문이 이중으로 반복되기 때문에 비효율적이라는 단점이 있다.

이후 Vectorization을 통해 이를 해결할 수 있다.

Source

Neural Networks and Deep Learning

신경망 및 딥 러닝

deeplearning.ai에서 제공합니다. In the first course of the Deep Learning Specialization, you will study the foundational concept of neural networks and ... 무료로 등록하십시오.

www.coursera.org

저작자표시 비영리 변경금지 (새창열림)

'Coding > Coursera' 카테고리의 다른 글

[Andrew Ng] Neural Network and Deep Learning : 3. One hidden layer Neural Network (0)	2021.10.14
[Andrew Ng] Neural Network and Deep Learning : 2. Basics of Neural Network Programming (3) 프로그래밍 과제 정리 (0)	2021.10.12
[Andrew Ng] Neural Network and Deep Learning : 2. Basics of Neural Network Programming (2) (0)	2021.10.08
[Andrew Ng] Neural Network and Deep Learning : 1. Introduction to Deep Learning (0)	2021.09.28
[Andrew Ng] Machine Learning 정리 (0)	2021.09.01

현재글[Andrew Ng] Neural Network and Deep Learning : 2. Basics of Neural Network Programming (1)

Wendi's Learning 무엇이든지.

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Wendi's Learning

[Andrew Ng] Neural Network and Deep Learning : 2. Basics of Neural Network Programming (1)

Basics of Neural Network Programming

Binary Classification

Notation

Logistic Regression

Logistic Regression cost function

Gradient Descent

Derivatives

More derivatives examples

Computation Graph

Derivatives with a Computation Graph

Logistic Regression - Gradient Descent

Logistic Regression recap

Logistic Regression derivatives

Gradient descent on m examples

Source

'Coding > Coursera' 카테고리의 다른 글

'Coding/Coursera'의 다른글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

[Andrew Ng] Neural Network and Deep Learning : 2. Basics of Neural Network Programming (1)

Basics of Neural Network Programming

Binary Classification

Notation

Logistic Regression

Logistic Regression cost function

Gradient Descent

Derivatives

More derivatives examples

Computation Graph

Derivatives with a Computation Graph

Logistic Regression - Gradient Descent

Logistic Regression recap

Logistic Regression derivatives

Gradient descent on m examples

Source

'Coding > Coursera' 카테고리의 다른 글

'Coding/Coursera'의 다른글

관련글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역