Failure prevention in robot manipulation using adversarial reinforcement learning

dc.contributor.advisorSarıel, Sanem
dc.contributor.authorKutay, Mert Can
dc.contributor.authorID815451
dc.contributor.departmentComputer Engineering Programme
dc.date.accessioned2025-01-02T12:07:25Z
dc.date.available2025-01-02T12:07:25Z
dc.date.issued2023
dc.descriptionThesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023
dc.description.abstractRobotic manipulation is an important area of research with applications in fields such as manufacturing, agriculture, and healthcare. However, failure scenarios in these tasks can have costly and sometimes dangerous consequences. To address this challenge, there has been a growing interest in developing failure prevention policies that can ensure the safe and reliable operation of robotic systems. In this thesis, we propose a novel approach that leverages adversarial reinforcement learning to train robust policies against various failures in robotic manipulation. We start by defining the base skill and train a base agent that can accomplish this task. We propose to employ imitation learning to speed up the learning process for complicated base skills. Then, we determine a possible set of failures. For each failure, we define a risk function that indicates how much the environment is close to a catastrophic event associated with that failure type. These risk functions are manually engineered and normalized. Similar to the base agent, we train an adversary for each failure type with the reward function defined as the risk function for that failure. We call this process the isolated training phase. After the isolated training, we place the protagonist (or multiple protagonists) and the adversaries in a self-play environment. In this environment, agents take turns controlling the robot and try to maximize their respective rewards. The reward of the adversary is set as the penalty for the protagonist and the training process becomes a zero-sum game. After some training, the protagonist becomes more proficient in preventing the failures caused by the adversaries. As the domain, we have chosen the task of stirring a bowl with a spoon using a humanoid robot. We have implemented and tested the proposed method in a simulation environment that contains a bowl and 40 balls inside it. The agents move the spoon in a 3D space to interact with the environment and collect low-dimensional observations and rewards. For training a protagonist, we can either train a compound protagonist that can single-handedly prevent all failure types, or we can train distinct protagonists for each of the failure types. In this thesis, we take both approaches and compare the results. We evaluate the failure prevention performance of the protagonist by measuring the failure prevention success rate in a time window and the total number of steps required to reduce the risk below a certain threshold. The set of distinct protagonists outperforms the compound protagonist slightly. Overall, evaluation results show that adversarial learning is an efficient and successful way to learn prevention policies.
dc.description.degreeM.Sc.
dc.identifier.urihttp://hdl.handle.net/11527/26073
dc.language.isoen
dc.publisherGraduate School
dc.sdg.typeGoal 9: Industry, Innovation and Infrastructure
dc.subjectArtificial intelligence
dc.subjectDeep learning
dc.titleFailure prevention in robot manipulation using adversarial reinforcement learning
dc.title.alternativeÇekişmeli pekiştirmeli öğrenme ile robot etkileşimlerinde hata önleme
dc.typeMaster Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
Ad:
815451.pdf
Boyut:
8.05 MB
Format:
Adobe Portable Document Format

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
Ad:
license.txt
Boyut:
1.58 KB
Format:
Item-specific license agreed upon to submission
Açıklama