GAN-based intrinsic exploration for sample efficient reinforcement learnin

Kamar, Doğay

GAN-based intrinsic exploration for sample efficient reinforcement learnin

dc.contributor.advisor	Ünal, Gözde
dc.contributor.author	Kamar, Doğay
dc.contributor.authorID	504181511
dc.contributor.department	Computer Engineering
dc.date.accessioned	2024-11-04T12:16:45Z
dc.date.available	2024-11-04T12:16:45Z
dc.date.issued	2022-05-23
dc.description	Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2022
dc.description.abstract	Reinforcement learning is a sub-area of artificial intelligence in which the learner learns in a trial-and-error manner. The learner does so by executing an action depending on the current state it is in and observing the results. After executing an action, a reward signal is given to the learner and through the rewards, the learner can learn which actions are best in different situations. However, the learner is not given any prior information about the environment it is in or which action is the best depending on the current state. Therefore, exploring the environment is important for gathering the necessary information in order to navigate to the high rewards. Most common exploration strategies involve random action selection occasionally. However, they only work under some conditions such that the rewards need to be dense and well-defined. These conditions are hard to meet for many real-world problems and an efficient exploration strategy is needed for such problems. Utilizing the Generative Adversarial Networks (GAN), this thesis proposes a novel module for sample efficient exploration, called GAN-based Intrinsic Reward Module (GIRM). The GIRM computes an intrinsic reward for the states and the aim is to compute higher rewards for the novel, unexplored states. The GIRM uses GAN to learn the distribution of the states the learner observes and contains an encoder, which maps a query state to the input space of the generator of the GAN. Using the encoder and the generator, the GIRM can detect if a query state is among the distribution of the observed states. If it is, the state is regarded as a visited state, otherwise, it is a novel state to the learner, in which case the intrinsic reward will be higher. As the learner receives higher rewards for such states, it is incentivized to explore the unknown, leading to sample-efficient exploration. The GIRM is evaluated using two settings: a sparse reward and a no-reward environments. It is shown that the GIRM is indeed capable of exploring compared to the base algorithms, which involve random exploration methods, in both of the settings. Compared to the other studies in the field, the GIRM also manages to explore more efficiently in terms of the number of samples. Finally, we identify a few weaknesses of GIRM: the negative impact on the performance when sudden changes to the distribution of the observed states occur, and the exploitation of very large rewards not being avoided.
dc.description.degree	M.Sc.
dc.identifier.uri	http://hdl.handle.net/11527/25543
dc.language.iso	en_US
dc.publisher	Graduate School
dc.sdg.type	Goal 9: Industry, Innovation and Infrastructure
dc.subject	Computer vision
dc.subject	Bilgisayarla görme
dc.subject	Deep learning
dc.subject	Derin öğrenme
dc.subject	Machine learning
dc.subject	Makine öğrenmesi
dc.subject	Artificial intelligence
dc.subject	Yapay zeka
dc.subject	Learning techniques
dc.subject	Öğrenme teknikleri
dc.title	GAN-based intrinsic exploration for sample efficient reinforcement learnin
dc.title.alternative	Örnek verimli pekiştirmeli öğrenme için üretken çekişmeli ağlarla içsel keşif
dc.type	Master Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 504181511.pdf
Boyut:: 806.36 KB
Format:: Adobe Portable Document Format
Açıklama

İndir

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: license.txt
Boyut:: 1.58 KB
Format:: Item-specific license agreed upon to submission
Açıklama

İndir

Koleksiyonlar

LEE- Bilgisayar Mühendisliği-Yüksek Lisans