Quadcopter trajectory tracking control using reinforcement learning

thumbnail.default.alt
Tarih
2019-06-11
Yazarlar
Erdem, Mustafa
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Institute of Science and Technology
Özet
Unmanned aerial vehicles (UAVs) have gained enormous popularity since the last couple of decades. Quadcopters are the most popular subdivisions of UAVs. Their vertically taking-off, landing and hovering abilities make them ideal platforms for military, agriculture, surveillance and exploration missions. Their mechanical simplicity and agile maneuverability are other two reasons why the quadcopters are so popular. These mentioned reasons make the quadcopters excellent proving grounds for control theory applications. Even though designing a conventional controller for quadcopters is a relatively easy task, tuning those control parameters might easily become a time consuming challenge. Moreover, this requires a model of the system and any uncertainties in the system model or later modifications on the vehicle can quickly cause instabilities. Reinforcement learning is a subclass of artificial intelligence. The idea behind reinforcement learning is making an agent learn in an interactive environment by trial and error principle to achieve a specific task. Notwithstanding it has been discovered long ago, it has got its popularity back with the last advancements in the technology. In this thesis, at first, a conventional PD controller performance on a quadcopter model that is modeled on ETHZ Rotors framework in the Gazebo simulation environment was improved by implementing metaheuristic particle swarm optimization (PSO) algorithm. Thereafter using an actor-critic reinforcement learning algorithm called deep deterministic policy gradient (DDPG), quadcopter was trained to follow different trajectories. DDPG is an off-policy and model-free method, which has proven itself in different domains and tasks. DDPG has four neural network function approximators. These are actor, critic, target actor and target critic networks. The critic network approximates the current value of the agent state and the actor network generates actions with respect to state of the agent. During training, network values shift constantly. Using a constantly shifting set of values to adjust network parameters is not a reasonable thing to do. This makes the value estimations unmanageable. In order to avoid this, DDPG algorithm uses target networks that are used to make the training process more stable. These target networks are not updated at each step, contrary only periodically or slowly updated. Weight decay and batch normalization techniques that are normally not part of the original DDPG algorithm were also implemented to improve algorithm's performance. ADAM algorithm was used for optimization purpose. While training continues, the agent was presented a reward for each step in all episodes. Reward function is defined as negative weighted sum of quadcopter's position, velocity and acceleration errors. Tracking was assumed to be successful, if the tracking error is less than 10%. Tracking performances of both controllers were analyzed for different trajectories. PD controller outperforms reinforcement learning agent in most cases. However, it is needed to be stated that performance differences between two controllers are hard to notice and generalization, which is working on different quadcopter models under some assumptions, is the real advantage of reinforcement learning agent. Hyperparameters of the DDPG algorithm shape the learning behavior of the agent. It is highly possible for a reinforcement learning agent to perform equally or better compared to the conventional controllers. Therefore, as future work, with a given sufficient time, optimizing learning algorithm's hyperparameters and modifying network architectures are worth to investigate in order to have better performances.
Açıklama
Thesis (M.Sc.) -- Istanbul Technical University, Institute of Science and Technology, 2019
Anahtar kelimeler
reinforcement learning, pekiştirmeli öğrenme, unmanned aerial vehicle, insansız hava aracı
Alıntı