Dnext

June 4, 2022 3:32am

"Factory: Fast contact for robotic assembly." Assembly, as they refer to it here, is things like peg insertion, electrical connector insertion, tightening of nuts and bolts ("threaded fastener mating"), wire processing, cable routing, soldering, etc.

An "essential, but highly challenging area of manufacturing." Highly challenging for robots, that is. It's physically complex and demands strict reliability requirements.

"The power of physics simulation has not substantially impacted robotic assembly. For assembly, a simulator must accurately and efficiently simulate contact-rich interactions, a longstanding challenge in robotics, particularly for geometrically-complex, tight-clearance bodies." By "tight-clearance", they mean, for example, the small amount of space between a nut and a bolt combined with the small amount of space between the threads of the nut and bolt.

"To simulate real-world motion phases (e.g., initial mating, rundown) and associated pathologies (e.g., cross-threading, jamming), collisions between the threads must be simulated. However, high-quality surface meshes for a nut-and-bolt may consist of 10k-50k triangles; a naive collision scheme may easily exceed memory and compute limits. Moreover, for reinforcement learning training, a numerical solver may need to satisfy non-penetration constraints for 1,000 environments in real-time (i.e., at the same rate as the underlying physical dynamics). Despite the omnipresence of threaded fasteners in the world, no existing simulator achieves this performance."

That is the goal here. Factory is a net set of physics simulation methods to achieve this.

Factory consists of 3 primary components: A physics simulation module, A robot learning suite, and Proof-of-concept reinforcement learning policies.

They say their physics simulation achieves "fast, accurate simulations of contact-rich interactions through a novel synthesis of signed distance function (SDF)-based collisions, contact reduction, and a Gauss-Seidel solver."

The the signed distance function is a mathematical function that determines how far a point is from a boundary, and which side of the surface the point is on. A Gauss-Seidel solver is a method for solving a system of linear equations, named after, yes, that Gauss, Carl Friedrich Gauss, and the Seidel is Philipp Ludwig von Seidel. Remember "SDF" because you're going to be seeing it a lot.

"The module is accessible within the PhysX physics engine and Isaac Gym. We demonstrate simulator performance on a wide range of challenging scenes. As an example, we simulate 1,000 simultaneous nut-and-bolt assemblies in real-time on a single GPU, whereas the prior state-of-the-art was a single nut-and-bolt assembly at 1/20 real-time."

The robot learning suite consists of "a Franka robot and all rigid-body assemblies from the NIST Assembly Task Board 1, the established benchmark for robotic assembly. The suite includes 60 carefully-designed assets, 3 robotic assembly environments, and 7 classical robot controllers. The suite is accessible within Isaac Gym. User-defined assets, environments, and controllers can be added and simulated as desired."

Proof-of-concept reinforcement learning policies are for "a simulated Franka robot to solve the most contact-rich task on the NIST board, nut-and-bolt assembly." Also in Isaac Gym. Presumably you could use the physics and robot learning assets to do your own reinforcement learning. But it's nice that they've given you some pre-trained "policies" (as they are called in the world of reinforcement learning.) (In the world of reinforcement learning, the word "policy", rather than "model", is used. More precisely, a neural network learns a "model", but a "policy" is a more general concept and can apply to learning algorithms that are not neural networks. I always tell people a "policy" corresponds to what we in normal life would call a "strategy" -- a method of deciding what action to take to win the game from any given situation. What strategy might you take to win a Go game or poker game? The reinforcement learning framework is general enough that any "reward" signal can be used. Here you get the reward and "win" the game by successfully assembling items for manufacturing.)

They say they have compared the contact forces generated from executing their policies and they are consistent with the real world.

That's a brief overview. Taking a closer look at the physics contact simulation, they give the following explanation for why they use voxel-based SDFs rather than the standard triangle-based SDFs: "Using SDFs for collisions requires precomputing SDFs offline from a mesh, which can be time- and memory-intensive. Moreover, collision schemes typically test the vertices of a trimesh against the SDF to generate contacts. For sharp objects, simply sampling vertices can cause penetration to occur, motivating iterative per-triangle contact generation. We use discrete, voxel-based SDFs as our geometric representation and demonstrate that they provide efficient, robust collision detection for challenging assets in robotic assembly."

The next technique they employ is contact reduction. They use 3 techniques from video games used to reduce the amount of contacts that have to be checked to see if a collision between objects has occurred. Those techniques are called normal similarity, penetration depth, and an area-based metric.

Contact clustering, as the name implies, groups contacts into clusters and then reduces the number of contacts in each cluster to just a few that need to be checked. Normal similarity is a clustering technique that assigns surfaces with the same surface normal to the same bin. (A surface normal is a vector that points "straight up" from a point on the surface.)

Once binned into clusters, the penetration depth technique culls bins that have "negligible penetration."

Ok, at this point, I'm not sure exactly why, but in addition to the SDF, a direct solver is required to actually detect all the collisions. They look at two options, the Jacobi solver, and the aforementioned Gauss-Seidel solver, which you already know, because it is aforementioned, is the one they selected. The Jacobi solver was the more efficient of the two on a large number of contact points, but, they discovered that using their contact reduction techniques, they could reduce the number of contact points to a sufficiently low number that the Gauss-Seidel solver was actually faster. For example, for nut-and-bolt assembly, they could reduce the number of contact points that needed to be checked from 16,000 to 300.

They describe how they tested the system on, 1,024 parallel 4-mm peg-in-hole assemblies, 1,024 parallel M16 nut-and-bolt assemblies, 1,024 parallel VGA-style D-subminiature (D-sub) connectors, 1,024 parallel 2-stage gear assemblies, 1,024 M16 nuts, 1,024 bowls falling into a pile (not something you probably actually want to see in a real manufacturing plant, but makes a cool demonstration video), 1,024 toruses, falling into a pile, and 128 parallel Franka robot + M16 nut-and-bolt assemblies.

Moving on to the robot and environment assets, they lament how a set of computer-aided-design (CAD) models called NIST Task Board 1 are not good enough for high-accuracy physics simulation. "The models for the nuts, bolts, pegs, and gear assembly do not conform to real-world tolerances and clearances; in assembly, mating parts together with tight clearances is precisely the most significant challenge. Furthermore, the models for the electrical connectors were sourced from public repositories rather than manufacturers, were geometrically incompatible, were incomplete, and/or were designed using hand measurements." This motivated them to create their own CAD models for nuts, bolts, pegs, gearshafts, electrical connectors, etc.

In addition they provide 3 environments, with the Pythonic names "FrankaNutBoltEnv", "FrankaInsertionEnv", and "FrankaGearsEnv". As you might guess, all of these involve the Franka robot. The first is for training a Franka robot to do nut-and-bolt assemblies. The second is for insertion assemblies which means things like USB plugs and sockets, RJ45 plugs and sockets, BNC plugs and sockets, D-sub plugs and sockets, etc. USB you're probably familiar with, RJ45 is the connector at the end of ethernet cables, BNC is a coaxial cable which you might use with your cable TV company internet, D-sub is the connector used for VGA cables, if you remember those, though there are variations on the plug used for other things. And the third is for training a Franka robot to assemble gear assemblies. It comes with a 4-part gear assembly.

Before we get to the reinforcement learning, we have to talk about controllers for a moment, because the actions available to the controller will determine the actions available to the reinforcement learning algorithm. The researchers looked around at what controllers were being used in the real world. They came up with the following list: Joint-space inverse differential kinematics (IK) motion controller, joint-space inverse dynamics (ID) controller, task-space impedance controller, operational-space (OSC) motion controller, open-loop force controller, closed-loop P force controller, and hybrid force-motion controller.

I didn't actually look at the mathematical formulations of these controllers. From the descriptions in the paper, it sounds like they vary in the way they incorporate gravity, inertia, and errors into their calculations for how much torque to apply to a robot joint.

Ok, now we get to the reinforcement learning. The approach they took was to train the reinforcement learning system on 3 subtasks, and then teach the system to combine the 3 subtasks into a sequence. The 3 subtasks are "pick", "place", and "screw". To train these, they used the nut & bolt environment. For "pick", the robot has to grasp a nut placed at any random location on a work surface. For "place", the robot has to place the nut on top of a bolt at a fixed location. For "screw", the robot has to screw down the nut, engaging the mating threads and tightening the appropriate amount until the nut is firmly in place at the base. These are all done with a 6-degrees-of-freedom Franka robot hand.

For "pick", a reward function was fashioned that is based on the distance between the robotic fingertips and the nut. Further reward was granted if the nut remained in the robot hand's grasp after lifting.

For "place", a reward function was fashioned that was based not just on the distance to the bolt, but a number of distances to a number of "keypoints", which also reward the robot for getting the nut in the right orientation.

For "screw", a reward function was fashioned that was based on keypoint distances, this time between the nut and the base of the bolt, to reward the robot for screwing it down, and also between the rest of the bolt and the nut, to make the tightening process more stable.

As a result, the robot was able to learn how to generate "the precise torques along the 7 arm joints to allow the high-inertia robot links to maintain appropriate posture of the gripper." This is not to say there weren't problems. "As a simplifying assumption, the joint limit of the end-effector was removed, allowing the Franka to avoid regrasping." Not something you could do in the real world. But...

"Nevertheless, training was replete with a diverse range of pathologies, including high-energy collision with the bolt shank, roll-pitch misalignment of the nut when first engaging the bolt threads, jamming of the nut during tightening, and precession of the gripper around the bolt during tightening, which induced slip between the gripper and nut."

To address these issues, the researchers embarked on a "systematic exploration of controllers/gains, observation/action spaces, and baseline rewards." "The highest performing agents consistently used an OSC motion controller with low proportional gains, an observation space consisting of pose and velocity of the gripper and nut, a 2-degrees-of-freedom action space (Z-translation and yaw), and a linear baseline reward."

In an effort to further speed things up, they put in a limit on the number of gradient updates to the policy and put in an early termination rule.

Anyway, combining the 3 subtasks in a sequence, the researchers said that they were able to achieved an end-to-end pick + place + screw success rate of 74.2%.

Through out all of this, the recorded the contact forces involved. "Although the reward functions for the reinforcement learning agents never involved contact forces, the robots learned policies that generated forces in the middle of human ranges; the much higher variance of human forces was likely due to more diverse strategies adopted by humans."

All in all, a big step forward for robotics for manufacturing assembly.

Factory: Fast contact for robotic assembly

#solidstatelife #ai #robotics #manufacturing #reinforcementlearning