"Rapid motor adaptation for legged robots" (RMA). This work was inspired by the observation that animals can rapidly adapt to different soils, moving uphill or downhill, carrying loads, moving with rested or tired muscles, responding to injury, and so on, while robotics systems require immense training to adapt to any of these, which just can't be done on time scales of fractions of a second.
The way their solution works is by, instead of just having a "policy", the term that in the reinforcement learning field refers to the actions that the reinforcement learning agent will undertake in any given state, they have a "policy" plus an "adaptation module". The "policy" is trained in simulation, except it is allowed to magically know information about its environment in the simulation, such as surface friction, the weight of its payload, and so on. The job of the "adaptation module" is to guess this environmental information, which they call the "extrinsics". Furthermore, the way these "extrinsics" are estimated based on the difference between what the robot joints are commanded to do vs what they actually do.
Further further more, this "adaptation module" can also be trained in simulation. This is because the simulation can be constructed in such a way that the "policy" training procedure is allowed to know the "privileged" information about the "extrinsics" but the "adaptation module" is denied this knowledge and has to learn it from experience.
Of course all this depends on having a rich enough simulation environment that it encompasses everything the robot will encounter in the real world. The researchers created a fractal terrain generator that creates a wide variety of physical contexts for the robot to experience with wide variation in parameters like mass and friction.
Since this is a reinforcement learning system, you might be wondering what the all-important reward function is for the policy portion. The reward function rewards the robot for moving forward and penalizes it for jerky or inefficient motions. More specifically, it is rewarded for going close to 0.35 m/s, which was chosen as the maximum speed, and penalized for lateral movement of joints, rotation of joints, joint speed, tipping sideways, vertical acceleration, foot slippage, and overall expenditure of energy.
The training of the adaptation module works by giving it access to the robot's internal state and its recent actions, but not the actual environmental "extrinsics". Since in simulation the "ground truth" is known, the adaptation module can be trained using supervised learning instead of reinforcement learning. The full set of "extrinsics" are: friction, payload mass, center of mass, motor strength, position and velocity from the motor encoders, roll and pitch from the IMU sensor, and the foot contact indicators from the foot sensors.
In the real world, the robot used is the A1 robot from Unitree, which in simulation is simulated by a simulator called RaiSim. The robot's internal state consists of the joint positions (12 values), joint velocities (12 values), roll and pitch of the torso and binary foot contact indicators (4 values), and the actions it has available are position controls for the 12 robot joints.
The neural network for the "policy" is a 3-layer fully connected network, while the neural network for the "adaptation module" is a 3-layer convolutional neural network.
For the results, well, just watch the videos.
RMA: rapid motor adaptation for legged robots
#solidstatelife #ai #robotics #quadrupeds #reinforcementlearning #simulation