Skip to main content

2024 | Buch

Robotics, Computer Vision and Intelligent Systems

4th International Conference, ROBOVIS 2024, Rome, Italy, February 25–27, 2024, Proceedings

insite
SUCHEN

Über dieses Buch

This volume constitutes the proceedings of the 4th International Conference on Robotics, Computer Vision and Intelligent Systems, ROBOVIS 2024, which was held in Rome, Italy, during February 25-27, 2024.

The 8 full papers and 21 short papers are presented in this book were carefully reviewed and selected from 33 submissions. They focus on topics on research and development in robotics, computer vision, and intelligent systems.

Inhaltsverzeichnis

Frontmatter
Compute Optimal Waiting Times for Collaborative Route Planning
Abstract
Collaborative routing tries to discover paths of multiple robots that avoid mutual collisions while optimising a common cost function. A collision can be avoided in two ways: a robot modifies its route to pass another robot, or one robot waits for the other to move first. Recent work assigns priorities to robots or models waiting times as an ‘action’ similar to driving. However, these methods have certain disadvantages. This paper introduces a new approach that computes theoretically optimal waiting times for given multi-routes. If all collisions can be avoided through waiting, the algorithm computes optimal places and durations to wait. We used this approach as component to introduce a collaborative routing system capable of solving complex routing problems involving mutual blocking.
Jörg Roth
Robot Vision and Deep Learning for Automated Planogram Compliance in Retail
Abstract
In this paper, automated planogram compliance technique is proposed for retail applications. A mobile robot with camera vision capabilities provides the images of the products on shelves, which are processed to reconstruct an overall image of the shelves to be compared to the planogram. The image reconstruction includes image frames extraction from live video stream, images stitching and concatenation. Object detection, for the products, is achieved using a deep learning tool based on YOLOv5 model. Dataset, for algorithm training and testing, is built to identify the products based on their image identification, number, and location on the shelf. A small scale of shelves with products is built and different cases of products on shelves are tested in a laboratory environment. It was found that YOLOv5 algorithm detects various products on shelves with a precision of 0.98, recall of 0.99, F-measure of 0.98, and clarification loss of 0.006.
Adel Merabet, Abhishek V. Latha, Francis A. Kuzhippallil, Mohammad Rahimipour, Jason Rhinelander, Ramesh Venkat
Park Marking Detection and Tracking Based on a Vehicle On-Board System of Fisheye Cameras
Abstract
Automatic parking assistance systems based on vehicle perception are becoming increasingly helpful both for driver’s experience and road safety. In this paper, we propose a complete and embedded compatible parking assistance system able to detect, classify, and track parking spaces around the vehicle based on a 360\(^{\circ }\) surround view camera system. Unlike the majority of the state-of-the-art studies, the approach outlined in this work is able to detect most types of parking slots without any prior parking slot information. Additionally, the method does not rely on bird-eye view images, since it works directly on fisheye images increasing coverage area around the vehicle while eliminating computational complexity. The authors propose a system to detect and classify, in real time, the parking slots on the fisheye images based on deep learning models. Moreover, the 2D camera detections are projected in a 3D space in which a Kalman Filter-based tracking is used to provide a unique identifier for each parking slot. Experiments done with a configuration of four cameras around the vehicle show that the presented method obtains qualitative and quantitative satisfactory results in different real live parking scenarios while maintaining real-time performance.
Ruben Naranjo, Joan Sintes, Cristina Pérez-Benito, Pablo Alonso, Guillem Delgado, Nerea Aranjuelo, Aleksandar Jevtić
Analysis of Age Invariant Face Recognition Efficiency Using Face Feature Vectors
Abstract
One of the main problems for face recognition when comparing photos of various ages is the impact of age progression on facial features. The face undergoes many changes as a person grows older, including geometrical changes and changes in facial hair, etc. Even though biometric markers such as computed face feature vectors should preferably be invariant to such factors, face recognition generally becomes less reliable as the age span grows larger. Therefore, this study was conducted with the aim of exploring the efficiency of such feature vectors in recognising individuals despite variations in age, and how to measure face recognition performance and behaviour in the data. It is shown that they are indeed discriminative enough to achieve age-invariant face recognition without synthesising age images through generative processes or training on specialised age related features.
Anders Hast, Yijie Zhou, Congting Lai, Ivar Blohm
Uncertainty Driven Active Learning for Image Segmentation in Underwater Inspection
Abstract
Active learning aims to select the minimum amount of data to train a model that performs similarly to a model trained with the entire dataset. We study the potential of active learning for image segmentation in underwater infrastructure inspection tasks, where large amounts of data are typically collected. The pipeline inspection images are usually semantically repetitive but with great variations in quality. We use mutual information as the acquisition function, calculated using Monte Carlo dropout. HyperSeg is trained using active learning with an underwater pipeline inspection dataset of over 50,000 images. To allow reproducibility and assess the framework’s effectiveness, the CamVid dataset was also utilized. For the pipeline dataset, HyperSeg with active learning achieved 67.5% meanIoU using 12.5% of the data, and 61.4% with the same amount of randomly selected images. This shows that using active learning for segmentation models in underwater inspection tasks can lower the cost significantly.
Luiza Ribeiro Marnet, Yury Brodskiy, Stella Grasshof, Andrzej Wąsowski
Enhancing Connected Cooperative ADAS: Deep Learning Perception in an Embedded System Utilizing Fisheye Cameras
Abstract
This paper explores the potential of Cooperative Advanced Driver Assistance Systems (C-ADAS) that leverage Vehicle-to-Everything (V2X) communication to enhance road safety. The authors propose a deep learning based perception system, on a 360\(^\circ \) surround view within the C-ADAS. This system also utilizes an On-Board Unit (OBU) for V2X message sharing to cater to vehicles lacking their own perception sensors. The feasibility of these systems is demonstrated, showcasing their effectiveness in various real-world scenarios, executed in real-time. The contributions include the introduction of a design for a perception system employing fish-eye cameras in the context of C-ADAS, with the potential for embedded integration, the validation of the feasibility of day 2 services in C-ITS, and the expansion of ADAS functions through Local Dynamic Map (LDM) for Collision Warning Application. The findings highlight the promising potential of C-ADAS in improving road safety and pave the way for future advancements in cooperative perception and driving systems.
Guillem Delgado, Mikel Garcia, Jon Ander Íñiguez de Gordoa, Marcos Nieto, Gorka Velez, Cristina Pérez-Benito, David Pujol, Alejandro Miranda, Iu Aguilar, Aleksandar Jevtić
Weapon Detection Using PTZ Cameras
Abstract
Massive shooting in public places are a stigma in some countries. Computer vision techniques are being actively researched in the last few years to process video from surveillance cameras and immediately detect the presence of an armed individual. The research, however, has focused on images taken from cameras that are (as is the typical case) far from the entrance where the individual first appears. However, most modern video surveillance cameras have some pan-tilt-zoom (PTZ) capabilities, fully controllable by the operator or some control software. In this paper, we make the first (as far as the authors know) exploration on the use of PTZ cameras in this particular problem. Our results unequivocally reveal the transformative impact of integrating PTZ functionality, particularly zoom and tracking capabilities, on the overall performance of these weapon detection models. Experiments were carefully executed in controlled environments, including laboratory and classroom settings, allowing for a comprehensive evaluation. In these settings, the utility of PTZ in improving detection outcomes became evident, especially when confronted with challenging conditions such as dim lighting or multiple individuals in the scene. This research underscores the immense potential of modern PTZ cameras for automatic firearm detection. This advancement holds the promise of augmenting public safety and security.
Juan Daniel Muñoz, Jesus Ruiz-Santaquiteria, Oscar Deniz, Gloria Bueno
Improving Semantic Mapping with Prior Object Dimensions Extracted from 3D Models
Abstract
Semantic mapping in mobile robotics has gained significant attention recently for its important role in equipping robots with a comprehensive understanding of their surroundings. This understanding involves enriching metric maps with semantic data, covering object categories, positions, models, relations, and spatial characteristics. This augmentation enables robots to interact with humans, navigate semantically using high-level instructions, and plan tasks efficiently. This study presents a novel real-time RGBD-based semantic mapping method designed for autonomous mobile robots. It focuses specifically on 2D semantic mapping in environments where prior knowledge of object models is available. Leveraging RGBD camera data, our method generates a primitive object representation using convex polygons, which is then refined by integrating prior knowledge. This integration involves utilizing predefined bounding boxes derived from real 3D object dimensions to cover real object surfaces. The evaluation, conducted in two distinct office environments (a simple and a complex setting) utilizing the MIR mobile robot, demonstrates the effectiveness of our approach. Comparative analysis showcases our method outperforming a similar state-of-the-art approach utilizing only RGBD data for mapping. Our approach accurately estimates occupancy zones of partially visible or occluded objects, resulting in a semantic map closely aligned with the ground truth.
Abdessalem Achour, Hiba Al Assaad, Yohan Dupuis, Madeleine El Zaher
Offline Deep Model Predictive Control (MPC) for Visual Navigation
Abstract
In this paper, we propose a new visual navigation method based on a single RGB perspective camera. Using the Visual Teach & Repeat (VT &R) methodology [8], the robot acquires a visual trajectory consisting of multiple subgoal images in the teaching step. In the repeat step, we propose two network architectures, namely ViewNet and VelocityNet. The combination of the two networks allows the robot to follow the visual trajectory. ViewNet is trained to generate a future image based on the current view and the velocity command. The generated future image is combined with the subgoal image for training VelocityNet. We develop an offline Model Predictive Control (MPC) policy within VelocityNet with the dual goals of (1) reducing the difference between current and subgoal images and (2) ensuring smooth trajectories by mitigating velocity discontinuities. Offline training conserves computational resources, making it a more suitable option for scenarios with limited computational capabilities, such as embedded systems. We validate our experiments in a simulation environment, demonstrating that our model can effectively minimize the metric error between real and played trajectories.
Taha Bouzid, Youssef Alj
BiGSiD: Bionic Grasping with Edge-AI Slip Detection
Abstract
Object grasping is a crucial task for robots, inspired by nature, where humans can flexibly grasp any object and detect whether it is slipping from grasp or not, more by the sense of touch than vision. In this work we present a bionic gripper with an Edge-AI device that is able to dexterously grasp the handled objects, sense and predict their slippage. In this paper, a bionic gripper with tactile sensors and a time-of-flight sensor is developed. We propose a LSTM model which is used to detect (incipient) slip/slippage, where a 6 degree-of-freedom robot manipulator is used for data collection and testing. The aim of this paper is to develop an efficient slip detection system which we can deploy on the edge device on our gripper, so it can be a stand-alone product that can be attached to almost any robotic manipulator. We have collected a dataset, trained the model and achieved a slip detection accuracy of 95.34%. Due to the efficiency of our model we were able to implement the slip detection on an edge device. We use the Nvidia Jetson AGX Orin development board to show the inference/prediction in a real-time scenario. We demonstrate in the our experiments how the on-gripper slip detection capability allows more robust grasping as the grip force is adjusted in response to a slippage.
Youssef Nassar, Mario Radke, Atmaraaj Gopal, Tobias Knöller, Thomas Weber, ZhaoHua Liu, Matthias Rätsch
GAT-POSE: Graph Autoencoder-Transformer Fusion for Future Pose Prediction
Abstract
Human pose prediction, interchangeably known as human pose forecasting, is a daunting endeavor within computer vision. Owing to its pivotal role in many advanced applications and research avenues like smart surveillance, autonomous vehicles, and healthcare, human pose prediction models must exhibit high precision and efficacy to curb error dissemination, especially in real-world settings. In this paper, we unveil GAT-POSE, an innovative fusion framework marrying the strengths of graph autoencoders and transformers crafted for deterministic future pose prediction. Our methodology encapsulates a singular compression and tokenization of pose sequences through graph autoencoders. By harnessing a transformer architecture for pose prediction and capitalizing on the tokenized pose sequences, we construct a new paradigm for precise pose prediction. The robustness of GAT-POSE is ascertained through its deployment in three diverse training and testing ecosystems, coupled with the utilization of multiple datasets for a thorough appraisal. The stringency of our experimental setup underscores that GAT-POSE outperforms contemporary methodologies in human pose prediction, bearing significant promise to influence a variety of real-world applications favorably and lay a robust foundation for subsequent explorations in computer vision research.
Armin Danesh Pazho, Gabriel Maldonado, Hamed Tabkhi
UCorr: Wire Detection and Depth Estimation for Autonomous Drones
Abstract
In the realm of fully autonomous drones, the accurate detection of obstacles is paramount to ensure safe navigation and prevent collisions. Among these challenges, the detection of wires stands out due to their slender profile, which poses a unique and intricate problem. To address this issue, we present an innovative solution in the form of a monocular end-to-end model for wire segmentation and depth estimation. Our approach leverages a temporal correlation layer trained on synthetic data, providing the model with the ability to effectively tackle the complex joint task of wire detection and depth estimation. We demonstrate the superiority of our proposed method over existing competitive approaches in the joint task of wire detection and depth estimation. Our results underscore the potential of our model to enhance the safety and precision of autonomous drones, shedding light on its promising applications in real-world scenarios.
Benedikt Kolbeinsson, Krystian Mikolajczyk
A Quality-Based Criteria for Efficient View Selection
Abstract
The generation of complete 3D models of real-world objects is a well-known problem. The accuracy of a reconstruction can be defined as the fidelity to the original model, but in the context of the 3D reconstruction, the ground truth model is usually unavailable. In this paper, we propose to evaluate the quality of the model through local intrinsic metrics, that reflect the quality of the current reconstruction based on geometric measures of the reconstructed model. We then show how those metrics can be embedded in a Next Best View (NBV) framework as additional criteria for selecting optimal views that improve the quality of the reconstruction. Tests performed on simulated data and synthetic images show that using quality metrics helps the NBV algorithm to focus the view selection on the poor-quality parts of the reconstructed model, thus improving the overall quality.
Rémy Alcouffe, Sylvie Chambon, Géraldine Morin, Simone Gasparini
Multi-UAV Weed Spraying
Abstract
In agriculture, weeds reduce soil productivity and harvest quality. A common practice for weed control is via weed spraying. Ground spray of weeds is a common approach that may be harmful, destructive, and too slow, while aerial UAV spraying can be safe, non-destructive, and quick. Spraying efficiency and accuracy can be enhanced when adopting multiple UAVs. In this context, we propose a new multiple UAV spraying system that autonomously and accurately sprays weeds within the field. In our proposed system, a weed pressure map is first clustered. Then, the voronoi approach generates the appropriate number of waypoints. Finally, a variant of the Traveling Salesman Problem (TSP) is solved to find the best UAV tour for each cluster. The latter task is performed using two nature-inspired techniques, namely, NSGA2 and MOEA/D. To assess the performance of each method, we conducted a set of simulation tests. The results reported in this paper demonstrate the superiority of NSGA2 over MOEA/D. In addition, the heterogeneity of UAVs is studied, where we have a mix of fixed-wing and multi-rotor drones for spraying.
Ali Moltajaei Farid, Malek Mouhoub, Tony Arkles, Greg Hutch
Human Comfort Factors in People Navigation: Literature Review, Taxonomy and Framework
Abstract
Due to demographic shifts and improvements in medical care, person navigation systems (PNS) for people with disabilities are becoming increasingly important. So far, PNS have received less attention than mobile robots. However, the work on mobile robots cannot always be transferred to PNS because there are important differences in navigating people. In this paper, we address these differences by providing a comprehensive literature review on human comfort factors in people navigation, presenting a unified taxonomy for PNS and proposing a framework for integrating these factors into a navigation stack.
Based on the results, we extract the key differences and human comfort factors that have been addressed in current literature. Furthermore, the literature review shows that there is no unified taxonomy in this field. To address this, we introduce the term people navigation and a taxonomy to categorize existing systems. Finally, we summarize the human comfort factors that have been considered so far and provide an outlook on their implementation. Our survey serves as a foundation for a comprehensive research in people navigation and identifies open challenges.
Matthias Kalenberg, Christian Hofmann, Sina Martin, Jörg Franke
Region Prediction for Efficient Robot Localization on Large Maps
Abstract
Recognizing already explored places (a.k.a. place recognition) is a fundamental task in Simultaneous Localization and Mapping (SLAM) to enable robot relocalization and loop closure detection. In topological SLAM the recognition takes place by comparing a signature (or feature vector) associated to the current node with the signatures of the nodes in the known map. However, as the number of nodes increases, matching the current node signature against all the existing ones becomes inefficient and thwarts real-time navigation. In this paper we propose a novel approach to pre-select a subset of map nodes for place recognition. The map nodes are clustered during exploration and each cluster is associated with a region. The region labels become the prediction targets of a deep neural network and, during navigation, only the nodes associated with the regions predicted with high probability are considered for matching. While the proposed technique can be integrated in different SLAM approaches, in this work we describe an effective integration with RTAB-Map (a popular framework for real-time topological SLAM) which allowed us to design and run several experiments to demonstrate its effectiveness.
Matteo Scucchia, Davide Maltoni
Utilizing Dataset Affinity Prediction in Object Detection to Assess Training Data
Abstract
Data pooling offers various advantages, such as increasing the sample size, improving generalization, reducing sampling bias, and addressing data sparsity and quality, but it is not straightforward and may even be counterproductive. Assessing the effectiveness of pooling datasets in a principled manner is challenging due to the difficulty in estimating the overall information content of individual datasets. Towards this end, we propose incorporating a data source prediction module into standard object detection pipelines. The module runs with minimal overhead during inference time, providing additional information about the data source assigned to individual detections. We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets. The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.
Stefan Becker, Jens Bayer, Ronny Hug, Wolfgang Huebner, Michael Arens
Optimizing Mobile Robot Navigation Through Neuro-Symbolic Fusion of Deep Deterministic Policy Gradient (DDPG) and Fuzzy Logic
Abstract
Mobile robot navigation has been a sector of great importance in the autonomous systems research arena for a while. For ensuring successful navigation in complex environments several rule-based traditional approaches have been employed previously which possess several drawbacks in terms of ensuring navigation and obstacle avoidance efficiency. Compared to them, reinforcement learning is a novel technique being assessed for this purpose lately. However, the constant reward values in reinforcement learning algorithms limits their performance capabilities. This study enhances the Deep Deterministic Policy Gradient (DDPG) algorithm by integrating fuzzy logic, creating a neuro-symbolic approach that imparts advanced reasoning capabilities to the mobile agents. The outcomes observed in the environment resembling real-world scenarios, highlighted remarkable performance improvements of the neuro-symbolic approach, displaying a success rate of 0.71% compared to 0.39%, an average path length of 35 m compared to 25 m, and an average execution time of 120 s compared to 97 s. The results suggest that the employed approach enhances the navigation performance in terms of obstacle avoidance success rate and path length, hence could be reliable for navigation purpose of mobile agents.
Muhammad Faqiihuddin Nasary, Azhar Mohd Ibrahim, Suaib Al Mahmud, Amir Akramin Shafie, Muhammad Imran Mardzuki
DAFDeTr: Deformable Attention Fusion Based 3D Detection Transformer
Abstract
Existing approaches fuse the LiDAR points and image pixels by hard association relying on highly accurate calibration matrices. We propose Deformable Attention Fusion based 3D Detection Transformer (DAFDeTr) to attentively and adaptively fuse the image features to the LiDAR features with soft association using deformable attention mechanism. Specifically, our detection head consists of two decoders for sequential fusion: LiDAR and image decoder powered by deformable cross-attention to link the multi-modal features to the 3D object predictions leveraging a sparse set of object queries. The refined object queries from the LiDAR decoder attentively fuse with the corresponding and required image features establishing a soft association, thereby making our model robust for any camera malfunction. We conduct extensive experiments and analysis on nuScenes and Waymo datasets. Our DAFDeTr-L achieves 63.4 mAP and outperforms well established networks on the nuScenes dataset and obtains competitive performance on the Waymo dataset. Our fusion model DAFDeTr achieves 64.6 mAP on the nuScenes dataset. We also extend our model to the 3D tracking task and our model outperforms state-of-the-art methods on 3D tracking.
Gopi Krishna Erabati, Helder Araujo
MDC-Net: Multimodal Detection and Captioning Network for Steel Surface Defects
Abstract
In the highly competitive steel sector, product quality, particularly in terms of surface integrity, is critical. Surface defect detection (SDD) is essential in maintaining high production standards, as it directly impacts product quality and manufacturing efficiency. Traditional SDD approaches, which rely primarily on manual inspection or traditional computer vision techniques, are plagued with difficulties, including reduced accuracy and potential health concerns to inspectors. This research describes an innovative solution that uses a sequence generation model with transformers to improve the defect detection process while manufacturing hot-rolled steel sheets and generating captions about the defect and its spatial location. This method, which views object detection as a sequence generation problem, allows for a more sophisticated understanding of image content and a complete and contextually rich investigation of surface defects whilst providing captions. While this method can potentially improve detection accuracy, its actual power rests in its scalability and flexibility to various industrial applications. Furthermore, this technique has the potential to be further enhanced for visual question-answering applications, opening up opportunities for interactive and intelligent image analysis.
Anthony Ashwin Peter Chazhoor, Shanfeng Hu, Bin Gao, Wai Lok Woo
Operational Modeling of Temporal Intervals for Intelligent Systems
Abstract
Time is a crucial notion for intelligent systems, such as robotic systems, cognitive systems, multi-agent systems, cyber-physical systems, or auto-nomous systems, since it is inherent to any real-world process and/or environment. Hence, in this paper, we present operational temporal logic notations for modeling the time aspect of intelligent systems in terms of temporal interval concepts. Their application to intelligent systems’ application scenarios have demonstrated the usefulness and effectiveness of our developed approach.
J. I. Olszewska
A Meta-MDP Approach for Information Gathering Heterogeneous Multi-agent Systems
Abstract
In this paper, we address the problem of heterogeneous multi-robot cooperation for information gathering and situation evaluation in a stochastic and partially observable environment. The goal is to optimally gather information about targets in the environment with several robots having different capabilities. The classical Dec-POMDP framework is a good tool to compute an optimal joint policy for such problems. However, its scalability is weak. To overcome this limitation, we developed a Meta-MDP model with actions being individual policies of information gathering based on POMDPs. We compute an optimal exploration policy for each couple of robot and target, and the Meta-MDP model acts as a long-term optimal task allocation algorithm. We experiment our model on a simulation environment and compare to an optimal MPOMDP approach and show promising results on solution quality and scalability.
Alvin Gandois, Abdel-Illah Mouaddib, Simon Le Gloannec, Ayman Alfalou
Interacting with a Visuotactile Countertop
Abstract
We present the See-Through-your-Skin Display (STS-d), a device that integrates visual and tactile sensing with a surface display to provide an interactive user experience. The STS-d expands the application of visuo-tactile optical sensors to Human-Robot Interaction (HRI) tasks and Human-Computer Interaction (HCI) tasks more generally. A key finding of this paper is that it is possible to display graphics on the reflective membrane of semi-transparent optical tactile sensors without interfering with their sensing capabilities, thus permitting simultaneous sensing and visual display. A proof of concept demonstration of the technology is presented where the STS Visual Display (STS-d) is used to provide an animated countertop that responds to visual and tactile events. We show that the integrated sensor can monitor interactions with the countertop, such as predicting the timing and location of contact with an object, or the amount of liquid in a container being placed on it, while displaying visual cues to the user.
Michael Jenkin, Francois R. Hogan, Kaleem Siddiqi, Jean-François Tremblay, Bobak Baghi, Gregory Dudek
A Color Event-Based Camera Emulator for Robot Vision
Abstract
Event-based cameras are becoming increasingly popular due to their asynchronous spatial-temporal information, high temporal resolution, power efficiency, and high dynamic range advantages. Despite these benefits, the adoption of these sensors has been hindered, mainly due to their high cost. While prices are decreasing and commercial options exist, researchers and developers face barriers to addressing the potential of event-based vision, especially with more specialized models. Although accurate event-based simulators and emulators exist, their primary limitation lies in their inability to operate in real-time and in that they are designed only for grey-scale video streams. This limitation creates a gap between theoretical exploration and practical application, hindering the seamless integration of event-based systems into real-world applications, especially in robotics. Moreover, the importance of color information is well recognized for many tasks, and most existing event-based cameras do not handle color information, except for a few exceptions. To address this challenge, we propose a ROS-based color event camera emulator to aid in reducing the gap between the real-world applicability of event-based color cameras by presenting its software design and implementation. Finally, we present a preliminary evaluation to demonstrate its performance.
Ignacio Bugueno-Cordova, Miguel Campusano, Robert Guaman-Rivera, Rodrigo Verschae
Fast Point Cloud to Mesh Reconstruction for Deformable Object Tracking
Abstract
The world around us is full of soft objects we perceive and deform with dexterous hand movements. For a robotic hand to control soft objects, it has to acquire online state feedback of the deforming object. While RGB-D cameras can collect occluded point clouds at a rate of 30 Hz, this does not represent a continuously trackable object surface. Hence, in this work, we developed a method that takes as input a template mesh which is the mesh of an object in its non-deformed state and a deformed point cloud of the same object, and then shapes the template mesh such that it matches the deformed point cloud. The reconstruction of meshes from point clouds has long been studied in the field of Computer graphics under 3D reconstruction and 4D reconstruction, however, both lack the speed and generalizability needed for robotics applications. Our model is designed using a point cloud auto-encoder and a Real-NVP architecture. Our trained model can perform mesh reconstruction and tracking at a rate of 58 Hz on a template mesh of 3000 vertices and a deformed point cloud of 5000 points and is generalizable to the deformations of six different object categories which are assumed to be made of soft material in our experiments (scissors, hammer, foam brick, cleanser bottle, orange, and dice). The object meshes are taken from the YCB benchmark dataset. An instance of a downstream application can be the control algorithm for a robotic hand that requires online feedback from the state of the manipulated object which would allow online grasp adaptation in a closed-loop manner. Furthermore, the tracking capacity of our method can help in the system identification of deforming objects in a marker-free approach. In future work, we will extend our trained model to generalize beyond six object categories and additionally to real-world deforming point clouds.
Elham Amin Mansour, Hehui Zheng, Robert K. Katzschmann
Estimation of Optimal Gripper Configuration Through an Embedded Array of Proximity Sensors
Abstract
The task of picking up and handling objects is a great robotic challenge. Estimating the best point where the gripper fingers should come into contact with the object before performing the pick-up task is essential to avoid failures. This study presents a new approach to estimating the grasping pose of objects using a database generated by a gripper through its proximity sensors. The grasping pose estimation simulates the points where the fingers should be positioned to obtain the best grasp of the object. In this study, we used a database generated by a reconfigurable gripper with three fingers that can scan different objects through distance sensors attached to the fingers and palm of the gripper. The grasping pose of 13 objects was estimated, which were classified according to their geometries. The analysis of the grasping pose estimates considered the versatility of the gripper used. These object grasping pose estimates were validated using the CoppeliaSim software, where it was possible to configure the gripper according to the estimates generated and pick up the objects using just two or three fingers of the reconfigurable gripper.
Jonathas Henrique Mariano Pereira, Carlos Fernando Joventino, João Alberto Fabro, André Schneider de Oliveira
The Twinning Technique of the SyncLMKD Method
Abstract
This article introduces a novel technique for establishing a Digital Twin counterpart twinning methodology, aiming to attain elevated fidelity levels for mobile robots. The proposed technique, denominated as Synchronization Logarithmic Mean Kinematic Difference (SyncLMKD), is elucidated in detail within the confines of this study. Addressing the diverse fidelity requirements intrinsic to Industry 4.0’s dynamic landscape necessitates a sophisticated numerical method. The SyncLMKD technique, being numerical, facilitates the dynamic and decoupled adjustment of compensations about trajectory planning. Consequently, this numerical methodology empowers the definition of various degrees of freedom when configuring environmental layouts. Moreover, this technique incorporates considerations such as the predictability of distances between counterparts and path planning. The article also comprehensively explores tuning control, insights, metrics, and control strategies associated with the SyncLMKD approach. Experimental validations of the proposed methodology were conducted on a virtual platform designed to support the SyncLMKD technique, affirming its efficacy in achieving the desired level of high fidelity for mobile robots across diverse operational scenarios.
Fabiano Stingelin Cardoso, Ronnier Frates Rohrich, André Schneider de Oliveira
Intuitive Multi-modal Human-Robot Interaction via Posture and Voice
Abstract
Collaborative robots promise to greatly improve the quality-of-life for the aging population and also easing elder care. However existing systems often rely on hand gestures, which can be restrictive and less accessible for users with cognitive disability. This paper introduces a multi-modal command input, which combines voice and deictic postures, to create a natural humanrobot interaction. In addition, we combine our system with a chatbot to make the interaction responsive. The demonstrated deictic postures, voice and the perceived table-top scene are processed in real-time to extract the human’s intention. The system is evaluated for increasingly complex tasks using a real Universal Robots UR3e 6-DoF robot arm. The preliminary results demonstrate a high success rate in task completion and a notable improvement compared to gesture-based systems. Controlling robots through multi-modal commands, as opposed to gesture control, can save up to 48.1% of the time taken to issue commands to the robot. Our system adeptly integrates the advantages of voice commands and deictic postures to facilitate intuitive human-robot interaction. Compared to conventional gesture control methods, our approach requires minimal training, eliminating the need to memorize complex gestures, and results in shorter interaction times.
Yuzhi Lai, Mario Radke, Youssef Nassar, Atmaraaj Gopal, Thomas Weber, ZhaoHua Liu, Yihong Zhang, Matthias Rätsch
Virtual Model of a Robotic Arm Digital Twin with MuJoCo
Abstract
In this paper, a digital twin architecture for a Robotis manipulator’s arm is constructed on the Mujoco physics engine SDK. The virtual model in the Mujoco OpenGL virtual environment runs synchronously with the real robot via a TTL-USB physical communication and a C++ script running in Linux. The robot servomotor Dynamixel SDK and the MuJoCo SDK are segregated through threads for parallel execution in the C++ script. From the data flow perspective, we have proposed three scenarios in real-time: Digital Shadow, Digital Driven, and a Digital Twin itself. A preliminary test is performed to confirm the system is functioning as expected. This test compares the motor’s real and virtual torque in a static home position and a digital twin scenario. As this study is to be used as an exemplar for future research on Digital Twin frameworks, we propose future works to continue this research.
Bernardo Perez Inturias, João Pedro Garbelini Marques de Oliveira, Mauricio Becerra Vargas
Backmatter
Metadaten
Titel
Robotics, Computer Vision and Intelligent Systems
herausgegeben von
Joaquim Filipe
Juha Röning
Copyright-Jahr
2024
Electronic ISBN
978-3-031-59057-3
Print ISBN
978-3-031-59056-6
DOI
https://doi.org/10.1007/978-3-031-59057-3

Premium Partner