Posts

Understanding GloVe Vectors

Understanding GloVe Vectors There are many articles out there that tell you about word vectors and their uses. I will try to focus on how GloVe vectors are calculated and the underlying equations behind it. The motivation behind creating GloVe vector was that the authors wanted to create a model which utilizes the word-word co-occurrence counts and thus make efficient use of statistics. The end model is a weighted least squares model, with weights depending on the word-word co-occurrence counts. The regression equation they use is                   tr(wi)*w~k + bi + b~k = log(Xik) and the cost function is weighted least squares                  J = summition over the i,j of  f(Xij)*( tr(wi)*wi + bi + bj - log(Xij))^2 Here Xij is the  word-word co-occurrence counts. The equation was solved using AdaGrad optimizer in the paper. It gives out two vectors wi and w~i. The res...

Understanding Batch Normalization

Understanding of Batch Normalization :  Batch Normalization has had a profound impact on neural networks. Here, I attempt to explain how batch normalization works. Feel free to point out any mistakes in my interpretation. The motivation of Batch normalization is to normalize the values in the inner networks, as it is known that when we normalize our inputs, the model performs better. Although, I am not able to prove it mathematically why it happens, but I can provide an intuition as to why normalizing the input layers can help in better performance. Let us assume that our activation function is just an identity function. Then a normal neural network will just reduce to a simple regression. Normality specification is one of the conditions for the regression to be valid. If we violate the normality then our estimates are not accurate. Normalizing the inputs is a way of approximating the inputs as normal models. Coming back to Batch normalization, when we are normalizing t...