GAN-based intrinsic exploration for sample efficient reinforcement learnin

Kamar, Doğay

GAN-based intrinsic exploration for sample efficient reinforcement learnin

Dosyalar

504181511.pdf (806.36 KB)

Tarih

2022-05-23

Yazarlar

Kamar, Doğay

Yayınevi

Graduate School

Özet

Reinforcement learning is a sub-area of artificial intelligence in which the learner learns in a trial-and-error manner. The learner does so by executing an action depending on the current state it is in and observing the results. After executing an action, a reward signal is given to the learner and through the rewards, the learner can learn which actions are best in different situations. However, the learner is not given any prior information about the environment it is in or which action is the best depending on the current state. Therefore, exploring the environment is important for gathering the necessary information in order to navigate to the high rewards. Most common exploration strategies involve random action selection occasionally. However, they only work under some conditions such that the rewards need to be dense and well-defined. These conditions are hard to meet for many real-world problems and an efficient exploration strategy is needed for such problems. Utilizing the Generative Adversarial Networks (GAN), this thesis proposes a novel module for sample efficient exploration, called GAN-based Intrinsic Reward Module (GIRM). The GIRM computes an intrinsic reward for the states and the aim is to compute higher rewards for the novel, unexplored states. The GIRM uses GAN to learn the distribution of the states the learner observes and contains an encoder, which maps a query state to the input space of the generator of the GAN. Using the encoder and the generator, the GIRM can detect if a query state is among the distribution of the observed states. If it is, the state is regarded as a visited state, otherwise, it is a novel state to the learner, in which case the intrinsic reward will be higher. As the learner receives higher rewards for such states, it is incentivized to explore the unknown, leading to sample-efficient exploration. The GIRM is evaluated using two settings: a sparse reward and a no-reward environments. It is shown that the GIRM is indeed capable of exploring compared to the base algorithms, which involve random exploration methods, in both of the settings. Compared to the other studies in the field, the GIRM also manages to explore more efficiently in terms of the number of samples. Finally, we identify a few weaknesses of GIRM: the negative impact on the performance when sudden changes to the distribution of the observed states occur, and the exploitation of very large rewards not being avoided.

Açıklama

Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2022

Anahtar kelimeler

Computer vision, Bilgisayarla görme, Deep learning, Derin öğrenme, Machine learning, Makine öğrenmesi, Artificial intelligence, Yapay zeka, Learning techniques, Öğrenme teknikleri

URI

http://hdl.handle.net/11527/25543

Koleksiyonlar

LEE- Bilgisayar Mühendisliği-Yüksek Lisans

Tam öğe sayfası