Failure prevention in robot manipulation using adversarial reinforcement learning
Failure prevention in robot manipulation using adversarial reinforcement learning
thumbnail.default.placeholder
Dosyalar
Tarih
2023
Yazarlar
Kutay, Mert Can
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
Robotic manipulation is an important area of research with applications in fields such as manufacturing, agriculture, and healthcare. However, failure scenarios in these tasks can have costly and sometimes dangerous consequences. To address this challenge, there has been a growing interest in developing failure prevention policies that can ensure the safe and reliable operation of robotic systems. In this thesis, we propose a novel approach that leverages adversarial reinforcement learning to train robust policies against various failures in robotic manipulation. We start by defining the base skill and train a base agent that can accomplish this task. We propose to employ imitation learning to speed up the learning process for complicated base skills. Then, we determine a possible set of failures. For each failure, we define a risk function that indicates how much the environment is close to a catastrophic event associated with that failure type. These risk functions are manually engineered and normalized. Similar to the base agent, we train an adversary for each failure type with the reward function defined as the risk function for that failure. We call this process the isolated training phase. After the isolated training, we place the protagonist (or multiple protagonists) and the adversaries in a self-play environment. In this environment, agents take turns controlling the robot and try to maximize their respective rewards. The reward of the adversary is set as the penalty for the protagonist and the training process becomes a zero-sum game. After some training, the protagonist becomes more proficient in preventing the failures caused by the adversaries. As the domain, we have chosen the task of stirring a bowl with a spoon using a humanoid robot. We have implemented and tested the proposed method in a simulation environment that contains a bowl and 40 balls inside it. The agents move the spoon in a 3D space to interact with the environment and collect low-dimensional observations and rewards. For training a protagonist, we can either train a compound protagonist that can single-handedly prevent all failure types, or we can train distinct protagonists for each of the failure types. In this thesis, we take both approaches and compare the results. We evaluate the failure prevention performance of the protagonist by measuring the failure prevention success rate in a time window and the total number of steps required to reduce the risk below a certain threshold. The set of distinct protagonists outperforms the compound protagonist slightly. Overall, evaluation results show that adversarial learning is an efficient and successful way to learn prevention policies.
Açıklama
Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023
Anahtar kelimeler
Artificial intelligence,
Deep learning