Understanding Batch Normalization

Understanding of Batch Normalization :

Batch Normalization has had a profound impact on neural networks. Here, I attempt to explain how batch normalization works. Feel free to point out any mistakes in my interpretation.

The motivation of Batch normalization is to normalize the values in the inner networks, as it is known that when we normalize our inputs, the model performs better. Although, I am not able to prove it mathematically why it happens, but I can provide an intuition as to why normalizing the input layers can help in better performance. Let us assume that our activation function is just an identity function. Then a normal neural network will just reduce to a simple regression. Normality specification is one of the conditions for the regression to be valid. If we violate the normality then our estimates are not accurate. Normalizing the inputs is a way of approximating the inputs as normal models.

Coming back to Batch normalization, when we are normalizing the output of a neuron, we are in a way making it more normal. Also, this normalization ensures that all the neurons receive the input with similar variance. That is no single neuron dominates the layer. Since, we do not have the actual mean and variance while normalizing the data, the resultant normalized value is not accurate, which in turn means that this process is adding some noise in the training and thus, has regularization effect and make it more robust. The mini-batch mean is an estimator of the batch mean and thus, statistically makes sense to use mini-batch to calculate the mean and variance.
Since normalizing makes sure that there is no one way single neuron that can dominate the result, in a way it makes sure that all neurons contribute and thus when we do back-propagation, each neuron has a good gradient to use to update its weights. Thus, it speeds up the learning process too.

Search This Blog

My journey into Machine Learning

Understanding Batch Normalization

Comments

Post a Comment

Popular posts from this blog

Understanding GloVe Vectors