Deep Learning is all about Gradient Descent Algorithm
In this article we will talk about Backpropagation, Gradient Descent and Stochastic Gradient Descent
Hello Legends, I couldn’t keep my promise of sending out an issue of my Newsletter each week. I apologize for that, However, today I’m thrilled to share with you one of the core and most important topics in the world of Deep Learning and I decided to write on this topic as I felt the need that someone needs to write on this topic in a simple language and make others understand. I have spent a fair amount of learning Linear Algebra in the past couple of weeks. I went through watching 3Blue1Brown videos over and over and over again until I was sure that I understood its concepts. However, when I tried to recall it the next day, it went blurry.
In this article, I’m not going to throw complicated formulas, equations, or big jargon words that make you feel that these concepts are too tough for you to understand. I would rather try to explain to you Backpropagation, Gradient Descent, and Stochastic Gradient Descent, Some might get offended for calling Stochastic Gradient Descent instead of Stochastic Gradient Optimization because that’s what its purpose is.
Have you ever played with Lego Blocks? I believe you might have answered yes, when we use each building block and set them together based on the structure on we are building only then does the beautiful structure come to life. Neural networks are the same, Neural networks can consist of millions of neurons grouped together to simulate information sharing, just like our Human Brain. Our Human Brain learns everything with trial and error. The more you make mistakes the better you become at it.
Have you ever wondered How neural networks learn? The answer lies in a process called Backpropagation.
Neurons that fire together wire together - 3Blue1Brown video
So, What is Backpropagation?
Backpropagation is the heart of neural networks, neural networks use backpropagation to adjust their internal parameters (Weights and Biases) in order to make the prediction accurately. Think of it as tuning a musical instrument so that you can get the perfect melody of it.
How Backpropagation Works?
Forward Pass: When you feed your neural network with an input (for example: an image) it performs a series of calculations to make the prediction, these series of calculations happen in layers and information flows from one layer to the next.
Error Calculation: After making the prediction, the neural network wants to know whether the prediction was right or wrong. The backpropagation calculates the prediction’s error which is nothing but a difference between the predicted output and the actual value.
The cost function is used to measure the performance of the Machine Learning model. The cost function quantifies the error between predicted and expected values and presents that error in the form of Single real numbers.
Backward Pass: The error we calculated is propagated backward to blame each layer of the network for the mistakes they have made, and then, these layers adjust their internal parameters to reduce the error.
Gradient Descent
Gradient Descent plays a vital role in the backpropagation process. Although, before we discuss that let’s understand how gradient descent works.
Imagine you are on the top of a mountain and you wanna go downhill and there’s fog all around you, in that situation the first thing you do is look for the steepest point and go down. You calculate the average slope of all points(date) on that mountain before each step. Gradient Descent is slow but highly accurate.
Backpropagation Connection
Our goal in backpropagation is to minimize the errors, gradient descent helps us adjust the internal parameters (Weights and Bias) slightly to reduce the errors.
Stochastic Gradient Descent
Stochastic Gradient Descent is similar to gradient descent instead it’s faster and less precise.
Backpropagation Connection
Stochastic Gradient Descent is similar to gradient descent but instead of looking at all the training data points to adjust the parameter, stochastic gradient descent just looks at one data point randomly. This makes Stochastic Gradient Descent faster.
Useful Resources
Research Paper: A closer look at Memorization in Deep Networks
My sincere thanks to 3Blue1Brown. Take care folks, will write again when I learn something new.