GAN-based intrinsic exploration for sample efficient reinforcement learnin

dc.contributor.advisor Ünal, Gözde
dc.contributor.author Kamar, Doğay
dc.contributor.authorID 504181511
dc.contributor.department Computer Engineering
dc.date.accessioned 2024-11-04T12:16:45Z
dc.date.available 2024-11-04T12:16:45Z
dc.date.issued 2022-05-23
dc.description Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2022
dc.description.abstract Reinforcement learning is a sub-area of artificial intelligence in which the learner learns in a trial-and-error manner. The learner does so by executing an action depending on the current state it is in and observing the results. After executing an action, a reward signal is given to the learner and through the rewards, the learner can learn which actions are best in different situations. However, the learner is not given any prior information about the environment it is in or which action is the best depending on the current state. Therefore, exploring the environment is important for gathering the necessary information in order to navigate to the high rewards. Most common exploration strategies involve random action selection occasionally. However, they only work under some conditions such that the rewards need to be dense and well-defined. These conditions are hard to meet for many real-world problems and an efficient exploration strategy is needed for such problems. Utilizing the Generative Adversarial Networks (GAN), this thesis proposes a novel module for sample efficient exploration, called GAN-based Intrinsic Reward Module (GIRM). The GIRM computes an intrinsic reward for the states and the aim is to compute higher rewards for the novel, unexplored states. The GIRM uses GAN to learn the distribution of the states the learner observes and contains an encoder, which maps a query state to the input space of the generator of the GAN. Using the encoder and the generator, the GIRM can detect if a query state is among the distribution of the observed states. If it is, the state is regarded as a visited state, otherwise, it is a novel state to the learner, in which case the intrinsic reward will be higher. As the learner receives higher rewards for such states, it is incentivized to explore the unknown, leading to sample-efficient exploration. The GIRM is evaluated using two settings: a sparse reward and a no-reward environments. It is shown that the GIRM is indeed capable of exploring compared to the base algorithms, which involve random exploration methods, in both of the settings. Compared to the other studies in the field, the GIRM also manages to explore more efficiently in terms of the number of samples. Finally, we identify a few weaknesses of GIRM: the negative impact on the performance when sudden changes to the distribution of the observed states occur, and the exploitation of very large rewards not being avoided.
dc.description.degree M.Sc.
dc.identifier.uri http://hdl.handle.net/11527/25543
dc.language.iso en_US
dc.publisher Graduate School
dc.sdg.type Goal 9: Industry, Innovation and Infrastructure
dc.subject Computer vision
dc.subject Bilgisayarla görme
dc.subject Deep learning
dc.subject Derin öğrenme
dc.subject Machine learning
dc.subject Makine öğrenmesi
dc.subject Artificial intelligence
dc.subject Yapay zeka
dc.subject Learning techniques
dc.subject Öğrenme teknikleri
dc.title GAN-based intrinsic exploration for sample efficient reinforcement learnin
dc.title.alternative Örnek verimli pekiştirmeli öğrenme için üretken çekişmeli ağlarla içsel keşif
dc.type Master Thesis
Dosyalar
Orijinal seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.alt
Ad:
504181511.pdf
Boyut:
806.36 KB
Format:
Adobe Portable Document Format
Açıklama
Lisanslı seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.placeholder
Ad:
license.txt
Boyut:
1.58 KB
Format:
Item-specific license agreed upon to submission
Açıklama