Artificial Intelligence beating people in the physical world -- sort of. A labyrinth game is hooked up to two motors that act as "hands", a camera that acts as its "eyes", and a computer with a "model-based reinforcement learning" algorithm that acts as the "brain".
The key thing here is that the reinforcement learning algorithm practices in the physical world, not in simulation, just like humans. After 6 hour of practice, it outperforms humans. It found ways to 'cheat' by skipping certain parts of the maze and had to be explicitly instructed not to take any of those shortcuts.
The reinforcement learning algorithm incorporated is something called DreamerV3. It is an actor-critic system, and it collects experience from the physical world, then "replays" that out of a reply buffer, then "augments" that with generated "dreams". This reduces the amount of external experience the system needs to learn. (In reinforcement learning parlance, it increases the "sample efficiency".)
DreamerV3 actually consists of 3 neural networks: the world model, the critic, and the actor. All three are trained separately without sharing parameters or gradients. The system contains additional circuitry to dynamically adjust the balances of these 3 objectives without a human having to set "hyperparameters". The DreamerV3 system was originally trained on Minecraft. This labyrinth-playing system built on it is called CyberRunner.
#solidstatelife #ai #robotics #reinforcementlearning
https://www.youtube.com/watch?v=zQMKfuWZRdA