deep learning technical introduction

Telechargé par adil chergui
Input Convolution Layer
.
.
.
N+ 2P
T+ 2P
F
RC
RC
SC
Weights
.
.
.
.
.
.
.
Fp
.
.
Output Convolution Layer
.
.
.
Np
Tp
Fp
Deep learning:
Technical introduction
Input Convolution Layer
.
.
.
N+ 2P
T+ 2P
F
RC
RC
SC
Weights
.
.
.
.
.
.
.
Fp
.
.
Output Convolution Layer
.
.
.
Np
Tp
Fp
Thomas Epelbaum
10.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
x
Relu(x)
ReLU function and its derivative
ReLU(x)
ReLU’(x)
h(0)
0
Bias
h(0)
1
Input #2
h(0)
2
Input #3
h(0)
3
Input #4
h(0)
4
Input #5
h(0)
5
Input #6
h(1)
0
h(1)
1
h(1)
2
h(1)
3
h(1)
4
h(1)
5
h(1)
6
h(h)
0
h(h)
1
h(h)
2
h(h)
3
h(h)
4
h(h)
5
h(N)
1Output #1
h(N)
2Output #2
h(N)
3Output #3
h(N)
4Output #4
h(N)
5Output #5
Hidden
layer 1
Input
layer
Hidden
layer h
Output
layer
••• •••
.
.
.
N0
T0
F0
R0
R0
S0
.
.
.
.
.
.
.
F1
.
.
.
.
.
S1
R1
R1
N1
T1
F1
.
.
.
S2
R2
R2
N2
T2
F2
.
.
.
.
.
.
.
F3
.
.
.
.
.
.
.
.
T3
N3
F3
bias
.
.
.
.
.
.
F4
.
.
.
.
.
.
.
.
.
.
F5
.
.
.
.
.
.
.
.
.
.
.
.
F5
c(ν τ 1) c(ν τ )
h(ν1τ)
h(ν τ 1)
h(ν1τ)
h(ν τ 1)
h(ν1τ)
h(ν τ 1)
h(ν1τ)
h(ν τ 1)
Θfν(ν)Θiν(ν)Θgν(ν)Θoν(ν)
Θoτ(ν)
Θgτ(ν)
Θiτ(ν)
Θgτ(ν)
f(ν τ )
σ σ
tanh
i(ν τ )g(ν τ )
×
tanh
σo(ν τ )
×
h(ν τ )
h(ν τ )
h(ν τ )
h(ν τ )
h(ν τ )
h(ν τ )
h(ν τ )
h(ν τ )
+
+
+
+
×+
Input
Relu 1
BN 1
Conv 1
Relu 2
BN 2
Conv 2
Relu 3
BN 3
Conv 3
Output
+
=Res
September 12, 2017
2
Contents
1 Preface 5
2 Acknowledgements 7
3 Introduction 9
4 Feedforward Neural Networks 11
4.1 Introduction............................... 12
4.2 FNNarchitecture ............................ 13
4.3 Somenotations ............................. 14
4.4 Weightaveraging ............................ 14
4.5 Activationfunction........................... 15
4.6 FNNlayers................................ 20
4.7 Lossfunction .............................. 21
4.8 Regularization techniques . . . . . . . . . . . . . . . . . . . . . . . 22
4.9 Backpropagation ............................ 27
4.10 Which data sample to use for gradient descent? . . . . . . . . . . 29
4.11 Gradient optimization techniques . . . . . . . . . . . . . . . . . . 30
4.12 Weight initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Appendices .................................. 33
4.A Backprop through the output layer . . . . . . . . . . . . . . . . . . 33
4.B Backprop through hidden layers . . . . . . . . . . . . . . . . . . . 34
4.C Backprop through BatchNorm . . . . . . . . . . . . . . . . . . . . 34
4.D FNN ResNet (non standard presentation) . . . . . . . . . . . . . . 35
4.E FNN ResNet (more standard presentation) . . . . . . . . . . . . . 38
4.F Matrixformulation........................... 38
5 Convolutional Neural Networks 41
5.1 Introduction ............................... 42
5.2 CNNarchitecture............................ 43
5.3 CNNspecicities ............................ 43
5.4 Modification to Batch Normalization . . . . . . . . . . . . . . . . . 49
5.5 Network architectures . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.6 Backpropagation ............................ 56
Appendices .................................. 64
5.A Backprop through BatchNorm . . . . . . . . . . . . . . . . . . . . 64
5.B Error rate updates: details . . . . . . . . . . . . . . . . . . . . . . . 65
3
4CONTENTS
5.C Weight update: details . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.D Coefficient update: details . . . . . . . . . . . . . . . . . . . . . . . 68
5.E Practical Simplification . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.F Batchpropagation through a ResNet module . . . . . . . . . . . . 71
5.G Convolution as a matrix multiplication . . . . . . . . . . . . . . . 72
5.H Pooling as a row matrix maximum . . . . . . . . . . . . . . . . . . 75
6 Recurrent Neural Networks 77
6.1 Introduction ............................... 78
6.2 RNN-LSTM architecture . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Extreme Layers and loss function . . . . . . . . . . . . . . . . . . . 80
6.4 RNNspecicities ............................ 81
6.5 LSTMspecicities............................ 85
Appendices .................................. 90
6.A Backpropagation trough Batch Normalization . . . . . . . . . . . 90
6.B RNNBackpropagation......................... 91
6.C LSTM Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . 95
6.D Peepholeconnexions..........................101
7 Conclusion 103
Chapter 1
InputConvolution Layer
.
.
.
N+2P
T+2P
F
RC
RC
SC
Weights
.
.
.
.
.
.
.
Fp
.
.
OutputConvolution Layer
.
.
.
Np
Tp
Fp
Preface
InputConvolution Layer
.
.
.
N+2P
T+2P
F
RC
RC
SC
Weights
.
.
.
.
.
.
.
Fp
.
.
OutputConvolution Layer
.
.
.
Np
Tp
Fp
Istarted learning about deep learning fundamentals in February 2017.
At this time, I knew nothing about backpropagation, and was com-
pletely ignorant about the differences between a Feedforward, Con-
volutional and a Recurrent Neural Network.
As I navigated through the humongous amount of data available on deep
learning online, I found myself quite frustrated when it came to really un-
derstand what deep learning is, and not just applying it with some available
library.
In particular, the backpropagation update rules are seldom derived, and
never in index form. Unfortunately for me, I have an "index" mind: seeing a 4
Dimensional convolution formula in matrix form does not do it for me. Since
I am also stupid enough to like recoding the wheel in low level programming
languages, the matrix form cannot be directly converted into working code
either.
I therefore started some notes for my personal use, where I tried to rederive
everything from scratch in index form.
I did so for the vanilla Feedforward network, then learned about L1 and
L2 regularization , dropout[1], batch normalization[2], several gradient de-
scent optimization techniques... Then turned to convolutional networks, from
conventional single digit number of layer conv-pool architectures[3] to recent
VGG[4] ResNet[5] ones, from local contrast normalization and rectification to
bacthnorm... And finally I studied Recurrent Neural Network structures[6],
from the standard formulation to the most recent LSTM one[7].
As my work progressed, my notes got bigger and bigger, until a point when
I realized I might have enough material to help others starting their own deep
learning journey.
5
1 / 106 100%

deep learning technical introduction

Telechargé par adil chergui
La catégorie de ce document est-elle correcte?
Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans linterface ou les textes ? Ou savez-vous comment améliorer linterface utilisateur de StudyLib ? Nhésitez pas à envoyer vos suggestions. Cest très important pour nous !