Abstract:
This paper presents a Soft Actor-Critic(SAC) algorithm enhanced with an Attraction-Repulsion Model(ARM) guidance mechanism and a Long Short-Term Memory(LSTM) network to address the challenges of low-quality initial samples and unstable training in learning-based industrial assembly, particularly for irregular peg-in-hole tasks. First, to improve early-stage exploration efficiency, an ARM-based guidance strategy is introduced, using target pose information to steer the robotic arm and accelerate convergence. Second, the policy and value networks of SAC are augmented with LSTM layers, enabling the effective use of sequential interaction history to enhance policy learning and improve training stability. Simulations show that the proposed method achieves a 99.4% success rate in assembling a planetary reducer center shaft, reducing the average maximum contact force and torque by 68.8% and 79.2%, respectively, compared to the standard SAC. Physical experiments further yield a success rate exceeding 95%, with maximum contact force and torque remaining below 10 N and 1.5 N·m, demonstrating the algorithm's effectiveness and robustness.