Posts

Showing posts from November, 2019

Deep Reinforcement Learning

Lets talk about one of the difficult areas in ML - Deep Reinforcement Learning. Two of the most popularly used approaches in the space policy gradients and deep Q-networks . An agent interacts with the user within an environment and receives rewards . Policy search is finding a good set of parameters in the policy space . One way to explore the policy space is via policy gradient approach which evaluates the gradients of the rewards w.r.t the parameters and then moves in the direction of maximizing reward. The policy themselves can be defined via let's say Neural Networks. In the case of Supervised ML, we already know the best action from the set of actions and the NN could be trained by minimizing the cross-entropy loss between the estimated and target distributions. However, in RL, as we focus on long term reward, the reward itself could be delayed or sparse. This is known as the classic credit assignment problem. This problem is generally solved by summing up all