View Selection for Industrial Object Recognition

octobre 2022
Ingénierie & Outils numériques
Communications avec actes dans un congrès international
Auteurs : Kewei Xu (LINEACT), Nicolas Ragot (), Yohan Dupuis (LINEACT)
Conférence : 48th Annual Conference of the Industrial Electronics Society, 16 octobre 2022

The last industrial revolutions and the digital transformation have led to a rise of robotics and to the emergence of the concept of digital twin. A major challenge falls within the update of this virtual representation, so that the supervision operator and the system itself can take appropriate decisions. One way to achieve that is to take advantage of the multi-robot perception capabilities by merging their individual observations to collectively enhance object recognition and robot environmental understanding. Since object recognition strongly depends on the viewing angles, one challenge deals with identifying the most relevant camera poses containing the most relevant information about the nature of the object. In this paper we propose a smart view selection approach which aims at determining the poses of the cameras and the number of the most informative views while maximising the object recognition. Based on a synthetic view dataset of traditional industrial objects, we adopt a clustering-based approach for maximising the inter-class distance and minimising the intra-class one. To do so, we compute a score for each view based on the Fowlkes-Mallows Index. This leads us to order the dataset and select a subset of views maximising the score. Then, this subset is used as a training dataset for a knn-classifier. The results, presented in terms of F1-score metric, are promising and highlight the relevance of our work: i) our smart selection enables the collection of a limited number of the most informative camera poses for object recognition; ii) feature extraction from a pretrained CNN combined with a clustering algorithm allows the separability of industrial object categories; iii) our approach is robust since it provides good performances while the camera poses are in the neighbourhood of the exact camera positions provided by our processing pipeline.