LEE- Kontrol ve Otomasyon Mühendisliği-Yüksek Lisans

Bu koleksiyon için kalıcı URI

http://hdl.handle.net/11527/19360

Gözat

Improving sample efficiency in reinforcement learning control using autoencoders

(Graduate School, 2023-06-19) Er, Burak ; Doğan, Mustafa ; 504191138 ; Control and Automation Engineering

Through the use of autoencoders, this study proposes a novel method for enhancing the sample effectiveness of reinforcement learning (RL) control of dynamic systems. The primary goal of this study is to determine how well autoencoders can facilitate learning and enhance the resultant policies in RL control situations. The literature review provides an overview of the existing approaches to improving sample efficiency in RL. Model-based RL and Bayesian RL leverage prior knowledge and uncertainty estimates to make better decisions with fewer samples. Techniques such as prioritized experience replay and hindsight experience replay focus on improving the learning process from past experiences. Despite these advances, achieving high sample efficiency in complex and dynamic environments remains challenging. Autoencoders, with their ability to learn efficient representations, have recently gained interest in enhancing the sample efficiency of RL. However, their integration into RL methods for dynamic system control is an underexplored problem. Also most of the applications use only latent space while learning. This approach can cause loss of information, difficulty in interpreting latent space, difficulty in handling dynamic environments and outdated representation. In this study, proposed novel approach overcomes these problems using both states and their latent space while learning. The methodology consists of two main steps. First, a denoising-contractive autoencoder is developed and implemented for RL control problems, with a specific focus on its applicability to state representation and feature extraction. The autoencoder is pretrained using uniformly randomly selected states from the environment. The states are augmented with latent states generated by the encoder, providing additional information to the RL agent. The second step involves training a Deep Reinforcement Learning algorithm using the augmented states generated by the autoencoder. The algorithm is compared against a baseline DQN algorithm in the LunarLander environment, where observations from the environment are subject to zero mean Gaussian noise with standard deviation of 0.01. Different encoder architectures are explored and evaluated in terms of learning performance. The outcomes show that, in terms of average reward and speed to high rewards, the suggested algorithm consistently outperforms the baseline method. The experiments conducted on the OpenAI Gym's LunarLander environment provide valuable insights into the advantages of using autoencoders for RL control problems. The findings highlight the ability of autoencoders to improve the sample efficiency of RL algorithms by providing enhanced state representations and feature extraction capabilities. The results of this research contribute to the field of reinforcement learning and control by demonstrating the potential of autoencoders in addressing the challenges of sample efficiency in dynamic systems. The findings also encourage further exploration of different encoder architectures and their impact on RL performance. Overall, this study provides a comprehensive investigation into the effectiveness of autoencoders in improving sample efficiency in RL control problems. The proposed approach offers a promising avenue for future research and development of algorithms that leverage autoencoders to enhance the learning process in dynamic systems.
Nonlinear model predictive control with real time iteration for F-16 attitude control

(Graduate School, 2024-07-03) Kuzucu, Siyami Gürkan ; Doğan, Mustafa ; 504211118 ; Control and Automation Engineering

The enduring presence of the F-16 Fighting Falcon in global military fleets underscores its unparalleled versatility, reliability, and cost-effectiveness, decades after its initial introduction. Designed in the 1970s as a response to the need for a high-performance, multi-role fighter aircraft. Unlike conventional aircraft, the F-16's flight dynamics are significantly influenced by its relaxed static stability and fly-by-wire control system, which together facilitate a level of manoeuvrability that is both a marvel and a challenge to replicate in simulation and control algorithms. This complexity is further accentuated by the aircraft's ability to perform high-g manoeuvres and operate across a wide range of speeds and altitudes, presenting a multifaceted challenge for aerodynamicists and control engineers alike. Understanding and accurately modelling the F-16's flight dynamics are crucial for developing advanced flight control systems, such as Nonlinear Model Predictive Control (NMPC), that can leverage the aircraft's full potential while ensuring safety and efficiency. This study explores the deployment of a nonlinear Model Predictive Control (NMPC) framework for the attitude control of the F-16 fighter aircraft, underpinned by a Real-Time Iteration (RTI) optimization strategy. The study emphasizes the NMPC's capacity to adeptly manage the aircraft's nonlinear dynamics, offering a superior alternative to traditional control strategies by integrating constraints directly and optimizing control actions based on future trajectory predictions. A pivotal element of this research is the selection of the RTI method for optimization, chosen for its robustness in solving nonlinear problems and its adaptability to real-time requirements. This method ensures that the control strategy remains computationally viable while effectively managing the intricate dynamics of F-16 flight, thus facilitating real-time operational capabilities. This study consists of six chapters. In the first chapter, the aim of the thesis is discussed along with a literature review. The second chapter, titled "System Modeling and Control," covers the mathematical modeling of the aircraft, including coordinate frames, forces and moments, dynamic modeling, PID controller, nonlinear model predictive control (NMPC), and real-time iteration. The third chapter discusses the simulation environment and stability. In this context, detailed information about the JSBSim and Unreal Engine simulation environments used in the study is provided, along with discussions on the static and dynamic stability of the aircraft. The fourth chapter focuses on the implementation of the controllers. Detailed information on the implementation of PID and RTI-based NMPC is included in the study. The fifth chapter presents the simulation results of the control structures used, with results analyzed through graphs. The final chapter, the sixth chapter, discusses the general conclusions of the study, the details of the problems encountered during the implementation phases, and future research on NMPC.
Robotic fish for monitoring water pollution

(Graduate School, 2022-02-01) Ansari, Mohammed Javed ; Doğan, Mustafa ; 504161131 ; Control and Automation Engineering

The vast majority of the earth's surface is covered by water. Some parts of the ocean are so deep that even Mount Everest would be lost into them as if it never existed. Water bodies, irrespective of fresh or salty, big or small, all of them host some of the most unique ecosystems. Mankind is known to have set its sails into the oceans for time immemorial now. But it has only been possible in recent years that they have dived inside by the means of HOVs, ROVs, and AUVs. And still, most of it remains unexplored. Every living thing from a unicellular amoeba to Antarctic blue whales including every single plant needs water to survive. Otherwise, the earth would be as barren as any other planet known so far. The key to fact that life exists on the earth is water. But unfortunately, the amount of garbage of all kinds being dumped into the sources of water pollutes them and in a long run adversely affects and endangers the living things on planet earth. As our very existence depends on water, it's indispensable to monitor and take essential steps to preserve the water quality accordingly. Not only does water avail a sustainable condition for the terrestrial inhabitants, but also is a habitat to a huge number of species within. One of the most well-known species among these aquatic animals is fish. In this work, a brief study of types of fishes along with their structural definition is carried out to determine how they propel and swim in the water with their fin and then eventually use the discoveries to biomimetically design and implement a robotic fish capable of exploring water and taking certain readings with inbuilt sensors. The thus obtained readings can be used to monitor water. The robotic fish here tries moving in the water replicating the motion behaviors of a fish. This study consists of 5 different parts. Chapter 1 provides a brief introduction of the whole idea and the classification of fish according to their swimming behavior. Fishes swim in the water using their fins. They use their fins to produce a propulsive force that pushes them forward. Depending upon which part of the fish and how it pulsates fishes can be categorized into different classes. These classifications help study fishes better. A detailed categorization on the basis of various grounds is further discussed in this chapter. A common approach to classify fishes is based on the modes of propulsion that a fish applies while swimming i.e. whether undulatory or oscillatory methods of generating propulsive forces. These two categories of fish swimming modes are BCF (body and/or caudal fin) locomotion, and MPF (median and/or paired fin) locomotion. A thing common in these modes of propulsion is that the caudal fins play the most important role in producing the propulsive force generation. In this study, a "Carangiform & Fusiform" model has been adapted for replication. The first chapter also gives a brief description of "Biomimetics" along with some of its popular applications in various fields. Later in this chapter, the overall implementation of this work has been mentioned. Chapter 2 discusses works of a similar kind. It also comprises the methods used in other similar works. The caudal fin drive mechanism can be of single, multiple, or compliant type. It is already known that the caudal fin plays the most important role in swimming and maneuvering. And the stiffness of the joint that connects the caudal fin to the body of the fish is equally important for efficient swimming. Unlike other similar works, Turfi uses a single joint method with a soft caudal fin. The outer cover of Turfi was designed using SolidWorks. The 3D model was later printed using a 3D printer. The outer body of Turfi was divided into 2 halves while designing. The first half enclosed all the electronics (including the SD card module, battery, sensors, processor, and driver circuits) and the motors. The pectoral fins are controlled using micro servo motors that help Turfi in maneuvering and the caudal fin is driven using a dc motor attached to a reduction mechanism. The other half of Turfi is the caudal tail and its mechanism that creates the oscillatory motion in the caudal fin by the means of the dc motor. The caudal fin drive mechanism converts the rotary motion of the dc motor to oscillatory motion. The front enclosure part was 3D printed using Polylactic Acid (PLA) because of its stiffness. The posterior i.e., the caudal fin was made using Thermoplastic Polyurethane (TPU). TPU is best known for flexibility. Making the caudal fin with TPU gives the caudal fin a soft and flexible structure thus making the propulsion wavy and smooth. The ESP32 used as the processor is also embedded with a WiFi module. ESP32 is programmed to create an Async WiFi server. The asynchronous server allows Turfi to take the readings and store them on an SD card even when offline. And when connected can deliver all the data collected at once. This helps Turfi to navigate and collect data irrespective of its connection to the base station. Turfi while navigating underwater takes the sensor readings and stores them into an SD card. After the completion of navigation, Turfi resurfaces and connects with the base station using WiFi and sends all the readings made during the navigation. Turfi later. These readings can be accessed using an IP provided by ESP32. These details are discussed in Chapter 3. As this study progressed further it was seen that Turfi can be programmed in various ways to accomplish different tasks. In the 4th Chapter, the results of two different tests are included. In the first test, Turfi was programmed to take readings at a certain depth (i.e., 20cm). A PID controller using PID Library by Brett Beauregard was used to track the depth based on the readings from the depth sensor. The second test was similar to the first one except that Turfi was instructed to take left and right turns. 5th Chapter concludes this work by describing the complexity of multi-fin locomotion underwater. It also briefly explains how Turfi can be developed in order to accomplish further. Upgrades such as a camera to record underwater, sensors to measure pH, oxygen level, salinity, etc. can be attached to Turfi. These sensors can help Turfi monitor underwater in a more detailed way. An exit mechanism is also proposed in this section. The exit mechanism would help Turfi resurface in case the battery is below a certain level or once the navigation is complete. Once atop, the whereabouts of Turfi can be known using GPS. There have been works of similar nature done priorly. But most of them tend to focus on a descriptive analysis of the swimming behavior of a fish and then replicating it. In this work, the scope has been slightly widened by adding the sensors to make required readings. One major hindrance similar to the ones of previous works i.e., limitation to wirelessly communicate well is experienced while working on this project as well. Thus, a different approach is applied in this study. In this approach, Turfi is instructed to follow a certain navigation route. While navigating underwater, Turfi also stores the sensor readings on an SD card. These data can be retrieved wirelessly from Turfi over WiFi. Thus, obtained data can be used for further processing.
Trajectory tracking control of a quadrotor with reinforcement learning

(Graduate School, 2023-01-23) Çakmak, Eren ; Doğan, Mustafa ; 504181134 ; Control Engineering

Drone control algorithms are usually broken down into several steps. The innermost parts of a drone control algorithm are angle and angular velocity control loops. Whether it is fixed-wing or rotary-wing, these control loops conventionally consist of PID based controllers. Although a PID controller can control these loops successfully, it may not lead the outer loops to desired positions or velocities. An outer loop designed to manage these situations can be done with conventional controller loops. However, these kinds of controllers are heavily model-dependent and often require tuning. Motivated by this situation, the aim of the presented study is to show that reinforcement learning based algorithms can control a quadrotor drone without prior knowledge of the model. The most preferred model-free reinforcement algorithms in the literature are DDPG, TRPO, and PPO. The studies that use state-of-the-art reinforcement learning methods for quadcopter control are compared, and it is concluded that PPO is the best choice to begin with. An actor-critic neural network for PPO-clip, the most successful version of PPO, is built and trained on a custom Gym environment. The environment is a quadrotor model that covers fundamental dynamics. This study is composed of six chapters. In the first chapter, motivation of research and literature review are given. In the second chapter, the theoretical background to construct a quadrotor model is given, and a general picture of reinforcement learning and model-free algorithms is drawn. In the third chapter, a custom simulation environment using the features of Gym library is designed. Then, the neural network based controller is designed, in the fourth chapter. Next, the agent is trained in the custom environment, in the fifth chapter. The simulation results of hovering and trajectory tracking tests are given. In the last chapter, it is concluded that a model-free reinforcement learning-based neural network without any additional control loop can control a quadrotor, and possible future works for this study are discussed.

Gözat

Yazar "Doğan, Mustafa" ile LEE- Kontrol ve Otomasyon Mühendisliği-Yüksek Lisans'a göz atma

Sayfa başına sonuç

Sıralama Seçenekleri