• Conference
  • Engineering and Numerical Tools

Synthetic datasets for 6D Pose Estimation of Industrial Objects: Framework, Benchmark and Guidelines

Conférence : Communications avec actes dans un congrès international

This paper falls within the industry 4.0 and tackles the challenging issue of maintaining the Digital Twin of a manufacturing warehouse up-to-date by detecting industrial objects and estimating their pose in 3D, based on the perception capabilities of the robots moving all along the physical environment. Deep learning approaches are interesting alternatives and offer relevant performances in object detection and pose estimation. However, they meet the requirement of large-scale annotated datasets for training the models. In the industrial and manufacturing sectors, these massive datasets do not exist or are too specific to particular use-cases. An alternative aims to use 3D rendering software to build annotated large-scale synthetic datasets. In this paper, we propose a framework and guidelines for creating synthetic datasets based on Unity, which allows the 3D-2D automatic object labeling. Then, we benchmark several different datasets, from planar uniform background to 3D contextualized Digital Twin environment with or without occlusions, for the industrial cardboard box detection and 6D pose estimation based on the YOLO-6D architecture. Two major results arise from this benchmark: the first underlines the importance of training the deep neural network with a contextualized dataset according to the targeted use-cases to achieve relevant performances; the second highlights that integrating cardboard box occlusions in the dataset tends to degrade the performances of the deep-neural network.