Isaac Poulton

Learning to Trust Your Critic - Grokking GAE

I’ve spent longer than I care to admit using Generalised Advantage Estimation without really understanding it. While I knew how to write the code and tune the hyperparameters, I never understood why I used it beyond “It makes the agent learn better most of the time.” It turns out there’s a very intuitive explanation of exactly what it does, which doesn’t require any complex maths or deep analysis.

Easy A2C

There are a great number of simple, easy-to-understand tutorials around for how to build a DQN agent. However, the modern baseline for reinforcement learning is called Advantage Actor-Critic (sometimes Asynchronous Advantage Actor-Critic, which I’ll come back to in a bit). It is typically a better contestant than plain DQN for a few reasons: it converges on better optima, it runs faster, and it is simpler to implement. Unfortunately A2C doesn’t have many tutorials, and those that exist are hard to follow. This tutorial is to make sure the next wave of reinforcement learning scholars have a simple, intuitive explanation of the details of the algorithm.

Why We Need to Stop Redefining ‘AI’

John McCarthy coined the term ‘Artificial Intelligence’ in the Dartmouth Summer Research Project on Artificial Intelligence back in 1956. Ever since then, it has always been just out of reach of the technology of the time. Indeed, the original proposal for the Dartmouth Workshop said that: