XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search

Abstract

Mechanical search (MS) in cluttered environments remains a significant challenge for autonomous manipulators, requiring long-horizon planning and robust state estimation under occlusions and partial observability. In this work, we introduce XPG-RL, a reinforcement learning framework that enables agents to efficiently perform MS tasks through explainable, priority-guided decision-making based on raw sensory inputs. XPG-RL integrates a task-driven action prioritization mechanism with a learned context-aware switching strategy that dynamically selects from a discrete set of action primitives such as target grasping, occlusion removal, and viewpoint adjustment. Within this strategy, a policy is optimized to output adaptive threshold values that govern the discrete selection among action primitives. The perception module fuses RGB-D inputs with semantic and geometric features to produce a structured scene representation for downstream decision-making. Extensive experiments in both simulation and real-world settings demonstrate that XPG-RL consistently outperforms baseline methods in task success rates and motion efficiency, achieving up to 4.5x higher efficiency in long-horizon tasks. These results underscore the benefits of integrating domain knowledge with learnable decision -making policies for robust and efficient robotic manipulation.

Overview

XPG-RL consists of two main components: (1) a perception pipeline, which processes fused RGB-D images to extract semantic and geometric context and builds a compact scene representation; and (2) an RL-based decision-making module, which takes this representation as input and learns a policy to predict adaptive thresholds. These thresholds guide the selection among priority-structured action candidates (e.g., target grasping, occlusion removal, and viewpoint adjustment), enabling context-aware and efficient action execution.

Priority-Guided Action Candidates

The action space consists of three discrete primitives—target grasping, occlusion removal, and viewpoint adjustment—ranked from highest to lowest priority. Learned thresholds govern switching between these actions, enabling the agent to make efficient, interpretable decisions by sequentially evaluating actions in priority order.

Results

XPG-RL outperforms baselines across all clutter levels, with relative efficiency gains widening as object count increases.

BibTeX