What is Gradient Descent? (Simplified)
Gradient descent is the most popular optimization method in both Machine Learning and Deep Learning. But what is an optimization method? An optimization method is useful when trying to find the optimum solution or in are case with gradient descent we want to find the minima.
Despite this being a simplified version of what gradient descent is, I found myself unable to leave out the mathematical function describing how it works.
The Error Function, Cost Function and Loss Function all describe the same function depicted above. What this function is trying to do is find the m, b that minimizes Error(m,b) we’ll call it J(m, b) for the sake of simplicity.
The “Gradient” of a function gives you the direction of steepest ascent. So taking the negative of that “Gradient” gives you the direction of steepest decent.
Steps in Gradient Descent
Step 1: Compute ∇J (finds the direction of maximal change)
Step 2: Small step in -∇J direction (finds the direction of minimal change)
Step 3: Repeat.
So general speaking you can think about a neural network learning and how it does that. How does it match the correct inputs to a given output? Well as the network learns its trying to reduce some error function. Lets something Albert Einstein called thought experiments try imagining you’re standing somewhere on a mountain and you want to get as low as possible, as fast as possible, so you decide to follow these steps. You check your current altitude, your altitude a step north, a step south, a step east, and a step west. Using this, you will figure out which direction you should step to reduce your altitude as much as possible in this step. And you repeat until stepping in any direction will cause you to go up again.