Learning 6-DoF Task-oriented Grasp Detection via Implicit Estimation and Visual Affordance


Currently, task-oriented grasp detection approaches are mostly based on pixel-level affordance detection and semantic segmentation. These pixel-level approaches heavily rely on the accuracy of a 2D affordance mask, and the generated grasp candidates are restricted to a small workspace. To mitigate these limitations, we firstly construct a novel affordance-based grasp dataset and propose a 6-DoF taskoriented grasp detection framework, which takes the observed object point cloud as input and predicts diverse 6-DoF grasp poses for different tasks. Specifically, our implicit estimation network and visual affordance network in this framework could directly predict coarse grasp candidates, and corresponding 3D affordance heatmap for each potential task, respectively. Furthermore, the grasping scores from coarse grasps are combined with heatmap values to generate more accurate and finer candidates. Our proposed framework shows significant improvements compared to baselines for existing and novel objects on our simulation dataset. Although our framework is trained based on the simulated objects and environment, the final generated grasp candidates can be accurately and stably executed in the real robot experiments when the object is randomly placed on a support surface.

International Conference on Intelligent Robots and Systems (IROS)
Wenkai Chen
PhD Student

PhD student in Robotics & AI.