It’s so simple that it doesn’t even use cross-over, a technique so common in GAs that at first it felt strange to even call this algorithm a GA.
After initialising the population, the top T individuals (in this case neural network parameter vectors) are selected to be potential parents.
Popular algorithms in RL such as Q-learning and policy gradients use gradient descent.
ES also follows the gradient via an operation similar to finite differences.
These results indicate that GAs (and RS) are not all out better or worse than other methods of optimising DNN, but that they are a ‘competitive alternative’ that one can add to their RL tool belt.
Like Open AI, they state that although DNNs don’t struggle with local optima in supervised learning, they can still get into trouble in RL tasks due to a deceptive or sparse reward signal.
In April 2018, the code was optimised to run on a single personal computer.
The work to achieve this is described in an Uber AI labs blog post, and the specific code can be found here.
A shorter summary (written by me) can be found here.
The code used for the experiments in this paper can be found here.