Improving sample efficiency in reinforcement learning control using autoencoders

thumbnail.default.alt
Tarih
2023-06-19
Yazarlar
Er, Burak
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
Through the use of autoencoders, this study proposes a novel method for enhancing the sample effectiveness of reinforcement learning (RL) control of dynamic systems. The primary goal of this study is to determine how well autoencoders can facilitate learning and enhance the resultant policies in RL control situations. The literature review provides an overview of the existing approaches to improving sample efficiency in RL. Model-based RL and Bayesian RL leverage prior knowledge and uncertainty estimates to make better decisions with fewer samples. Techniques such as prioritized experience replay and hindsight experience replay focus on improving the learning process from past experiences. Despite these advances, achieving high sample efficiency in complex and dynamic environments remains challenging. Autoencoders, with their ability to learn efficient representations, have recently gained interest in enhancing the sample efficiency of RL. However, their integration into RL methods for dynamic system control is an underexplored problem. Also most of the applications use only latent space while learning. This approach can cause loss of information, difficulty in interpreting latent space, difficulty in handling dynamic environments and outdated representation. In this study, proposed novel approach overcomes these problems using both states and their latent space while learning. The methodology consists of two main steps. First, a denoising-contractive autoencoder is developed and implemented for RL control problems, with a specific focus on its applicability to state representation and feature extraction. The autoencoder is pretrained using uniformly randomly selected states from the environment. The states are augmented with latent states generated by the encoder, providing additional information to the RL agent. The second step involves training a Deep Reinforcement Learning algorithm using the augmented states generated by the autoencoder. The algorithm is compared against a baseline DQN algorithm in the LunarLander environment, where observations from the environment are subject to zero mean Gaussian noise with standard deviation of 0.01. Different encoder architectures are explored and evaluated in terms of learning performance. The outcomes show that, in terms of average reward and speed to high rewards, the suggested algorithm consistently outperforms the baseline method. The experiments conducted on the OpenAI Gym's LunarLander environment provide valuable insights into the advantages of using autoencoders for RL control problems. The findings highlight the ability of autoencoders to improve the sample efficiency of RL algorithms by providing enhanced state representations and feature extraction capabilities. The results of this research contribute to the field of reinforcement learning and control by demonstrating the potential of autoencoders in addressing the challenges of sample efficiency in dynamic systems. The findings also encourage further exploration of different encoder architectures and their impact on RL performance. Overall, this study provides a comprehensive investigation into the effectiveness of autoencoders in improving sample efficiency in RL control problems. The proposed approach offers a promising avenue for future research and development of algorithms that leverage autoencoders to enhance the learning process in dynamic systems.
Açıklama
Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2022
Anahtar kelimeler
Reinforcement learning, Pekiştirmeli öğrenme, Autoencoders, Otokodlayıcılar, Learning algorithms, Öğrenme algoritmaları
Alıntı