Optimization Algorithms
Gradient descent is a way of minimizing an objective function J(θ) by updating the model's parameters in the opposite direction of the gradient of the objective function. Batch Gradient Descent : Batch gradient descent computes the gradient of the cost function for the entire training set in just one update which makes it very slow and intractable for very large datasts. The parameters are updated as follows: θ = θ - η ∇ J(θ) Batch gradient descent is guaranteed to converge to the global minimum for convex problems. Stochastic Gradient Descent : SGD performs parameter update for each training example. It is much faster and can be used to learn online but due to single point updates it can be very noisy and cause the objective function to oscillate. It can continuously keep oscillating and is not guaranteed to converge. However, upon using a decreasing learning rate it is known to converge almost certainly. Mini-batch gradient descent : This is in...