Coding/Coursera

[Andrew Ng] Neural Network and Deep Learning : 4. Deep Neural Networks

폴밴 2021. 10. 19. 10:56

Deep L-layer Neural network

What is a deep neural network?

여러개의 hidden layer가 있는 NN을 Deep Neural network라고 한다.

Notation

  • 레이어 갯수 : L = 4
  • 레이어 $l$에 있는 노드(유닛) 갯수 : $n^{[l]}$
  • 레이어 $l$에 있는 activations : $a^{[l]}$
  • $a^{[l]}=g^{[l]}(z^{[l]})$
  • $z^{[l]}$의 가중치 $w^{[l]}$

Forward Propagation in a Deep Network

Forward Propagation

$$z^{[l]}=w^{[l]}A^{[l-1]}+b^{[l]}
\ A^{[l]}=g^{[l]}(z^{[l]})$$

  • Not vectorized$a^{[1]}=g^{[1]}(z^{[1]})$$a^{[2]}=g^{[2]}(z^{[2]})$$z^{[4]}=w^{[4]}a^{[3]}+b^{[4]}$
  • $\hat y = a^{[4]}=g^{[4]}(z^{[4]})$
  • ...
  • $z^{[2]}=w^{[2]}a^{[1]}+b^{[2]}$
  • $z^{[1]}=w^{[1]}x+b^{[1]}=w^{[1]}a^{[0]}+b^{[1]}$
  • Vectorized (dimension에 유의)$Z^{[l]}=W^{[l]}A^{[l-1]}+b^{[l]}$
  • $A^{[l]}=g^{[l]}(Z^{[l]})$
  • $for \ l=1 \ to \ 4$

Getting your matrix dimensions right

Parameters W and b

$$z^{[1]}=w^{[1]}x+b^{[1]}
\(3,1)=(3,2) \cdot(2,1)+(3,1)
\
(n^{[1]},1)=(n^{[1]},n^{[0]}) \cdot (n^{[0]},1)+(n^{[1]},1)$$

$w^{[l]}:(n^{[l]},n^{[l-1]})$

$b^{[l]}:(n^{[l]},1)$

$dw^{[l]}:(n^{[l]},n^{[l-1]})$

$db^{[l]}:(n^{[l]},1)$

Vectorized implementation

$$Z^{[1]}=W^{[1]}X+b^{[1]}
\
(n^{[1]},m)=(n^{[1]},n^{[0]}) \cdot (n^{[0]},m)+(n^{[1]},m)$$

training set 이 input X에 column 으로 쌓아 벡터화한다.

b는 broadcasting에 의해 자동으로 column이 복제된다.

$Z^{[l]},A^{[l]} :(n^{[l]},1)$

$dZ^{[l]},dA^{[l]} :(n^{[l]},1)$

Why deep representations?

Circuit theory and deep learning

There are functions you can compute with a “small” L-layer deep neural network that shallower networks require exponentially more hidden units to compute.

shallower networks 를 사용할 경우 필요한 hidden unit의 갯수가 기하급수적으로 늘어나게 된다.

Building blocks of Deep Neural networks

  • Forward propagationoutput : $a^{[l]}:g^{[l]}(z^{[l]})$
  • $z^{[l]}:w^{[l]}a^{[l-1]}+b^{[l]}, \ cache \ z^{[l]}$
  • Backward propagationoutput : $da^{[l-1]}, dw^{[l]},db^{[l]}$
  • $da^{[l]}, cached \ z^{[l]}$

Forward and backward propagation

Forward propagation

Input : $a^{[l-1]}$

Output : $a^{[l]}, cache \ z^{[l]}$

Backward propagation

Input : $da^{[l]}$

Output : $da^{[l-1]},dW^{[l]},db^{[l]}$

$$dz^{[l]}=da^{[l]}*g^{[l]'}(z^{[l]})
\ dw^{[l]}=dz^{[l]} \cdot a^{[l-1]}
\db^{[l]}=dz^{[l]}
\da^{[l-1]}=w^{[l]T}dz^{[l]}$$

Parameters vs Hyperparameters

  • Hyperparameter는 w, b를 결정하는 값들이다.iterationsnumber of hidden units n
  • choice of activation function
  • number of hidden layer L
  • Learning Rate $\alpha$
  • hyperparameter의 다양한 값들을 사용해보면서 cost function이 가장 낮을 때를 찾는다.

What does this have to do with the brain?

Source

Neural Networks and Deep Learning

 

신경망 및 딥 러닝

deeplearning.ai에서 제공합니다. In the first course of the Deep Learning Specialization, you will study the foundational concept of neural networks and ... 무료로 등록하십시오.

www.coursera.org