LEE- Mekatronik Mühendisliği-Doktora

Bu koleksiyon için kalıcı URI

http://hdl.handle.net/11527/19387

Gözat

Applications of deep reinforcement learning for advanced driving assistance systems

(Graduate School, 2023-07-23) Yavaş, Muharrem Uğur ; Kumbasar, Tufan ; 518162005 ; Mechatronics Engineering

Nowadays, advanced driving support systems are becoming more prevalent every day. For instance, although adaptive cruise control has been present in some mass-produced vehicles since 1980, it is now available in almost every new vehicle model and is becoming usable, especially in congested traffic situations, with the help of developing technology. On the other hand, the autonomous lane centering function developed for highway environments reduces the driving load on drivers. One of the main reasons for the advancement and prevalence of technology is the progress in environmental perception sensors. Decision-making algorithms can obtain high-accuracy positions of lanes and other vehicles' speed and positions on the road by blending data from intelligent camera and radar sensors. Thanks to advancements in artificial intelligence research, the main topic of this thesis is to evaluate the conditions of surrounding vehicles to achieve cruise follow speed, the amount of gas or brake applied, and finally, the lane changing decision by deep reinforcement learning. Deep reinforcement learning is the integration of reinforcement learning theory into new generation artificial neural networks that emerged with the deep learning revolution. In the proposed methods, both the adaptive cruise control and autonomous lane-changing functions designed with deep reinforcement learning have taken more optimal decisions than classical algorithms and the similarity between the decisions taken and those taken by human drivers has been revealed. Adaptive cruise control systems typically calculate the amount of acceleration required to maintain a safe following distance by using information about the distance to the closest vehicle. However, this method is not compatible with human driving behavior, as it involves scanning the entire traffic and taking into account the dynamic elements surrounding the vehicle being driven. In one of our proposed solutions, we designed the adaptive cruise control function using a model-based deep reinforcement learning method. In model-based reinforcement learning, the decision-making policy uses its own internal model during training to minimize interaction with the system. Therefore, one artificial neural network creates the decision-making policy, while a second network creates the internal model. By using the proposed meta-learning approach to train the two neural networks in a closed-loop fashion, we selected two leader vehicle data inputs for the algorithm instead of a single one. In our simulation environment, the model-based artificial intelligence algorithm performed better than the classical intelligent driver model. Additionally, we proposed a hybrid method that switches to the classical driver model if the internal model and real-world observations do not match for a certain period of time, with a fallback mechanism added to the system's internal model. xxiii In the second proposed study on adaptive cruise control, we suggested a discrete driver model inspired by human drivers' use of gas and brake pedals to manipulate them directly. In the analysis performed using data collected from real life, it was observed that drivers were driving at a stable state with certain gas and brake pedals and coped with dynamic conditions by applying delta brake or pedal. Different gas and brake delta levels were determined through statistical inference based on this dataset. In this case, as the inputs of the artificial intelligence algorithm, the position and speeds of all vehicles in a multi-lane highway in front of the vehicle were determined. When considering the superiority of the algorithms that work with a single leader vehicle compared to two leader vehicles on a single lane, the information of the vehicles on the adjacent lanes will help in case of changes in the leading vehicle of the ego vehicle. The deep Q-learning algorithm, which provides the best results in discrete outputs, was used as the decision-making algorithm. In the evaluations performed on both simulation and real test data, the proposed algorithm obtained the highest score. Especially, slowing down the vehicle in line with its own friction by giving a 0 output without pressing both gas and brake pedals, which can be evaluated as tactical decision-making, was frequently preferred by the designed algorithm. The other advanced driver assistance system studied in the thesis work is the autonomous lane-changing function. In the first original study, autonomous lane-changing was designed using deep reinforcement learning method, and the normally long training process was accelerated 5 times with the proposed safety reward feedback. In the autonomous lane-changing problem, the critical task is to process the position and speed information from all vehicles in front and behind in traffic and make safe maneuvers that will cause speed increase at the right time. Especially in complex traffic scenarios created in simulated environments, classical algorithms are adversely affected by sensor uncertainties and noises, and they cannot show optimal performance in the dynamic driving of multiple vehicles. With the uncertainty calculation in the designed deep reinforcement learning algorithm, the confidence level of the decisions made is observed, and progress is made in the important research area of explainable artificial intelligence. It seems that although deep reinforcement learning techniques have achieved significant successes, they still face integration issues in real-world applications. One of the main problems is the lengthy training process, which can take millions of steps, and the fact that policies are optimized through trial and error, making training in real systems impossible. One promising area of research is sim2real transfer, which involves transferring policies trained in simulation directly to real-world applications. In the second original study on autonomous lane changing, a new approach was introduced to measure the transferability between two simulators with different resolutions. The transferability was evaluated using a human-like usage score generated from the traffic situations when lane-changing decisions were made. In the training process, an adjusted reward function was used, and the proposed method outperformed reference methods in terms of both efficiency and safety, achieving the highest human-like lane-changing score.
Deep reinforcement learning approach in control of Stewart platform- simulation and control

(Graduate School, 2023-06-08) Yadavari,Hadi ; İkizoğlu, Serhat ; Aghaei Tavakol, Vahid ; 518162002 ; Mechatronics Engineering

As named, this work approaches the Stewart platform's controlling task with reinforcement learning methods, presenting a new simulation environment. The Stewart platform, having a broad range of applications that span from flight and driving simulators to structural test platforms, is a fully parallel robot. Exact control of the Stewart platform is challenging and essential in its applications to deliver the desired performance. The fundamental aim of artificial intelligence is to address complex problems by utilizing sensory information with a high number of dimensions. Reinforcement learning (RL) is a specific area of Machine Learning (ML) that incorporates an agent interacting with its surrounding environment according to some policies to maximize the sum of the future rewards as an objective function. The agent's learning process is based on a reward-penalty scheme according to the quality of the selected action from the policy space. In this manner, RL tries to solve many problems and tasks. The primary focus of this work revolves around acquiring the ability to control a sophisticated model of the Stewart platform through the utilization of cutting-edge deep reinforcement algorithms (DRL) and model-based reinforcement learning algorithms. The question is that why do we need a simulation environment? To learn an optimal policy, reinforcement learning necessitates a multitude of interactions with the environment. Experiences with real robots are expensive, time consuming, hard to replicate, and even dangerous. To safely implement the RL algorithms in real-time applications, a reliable simulation environment that considers all the nonlinearities and uncertainties of the agent environment is inevitable. Therefore, an agent could be trained in the simulation through sufficient trials without concerns about the actual hardware issues. After having accurate parameters of the controller learned by the simulation, they can be transferred to a physical real-time system. With the objective of improving the reliability of learning performance and creating a comprehensive test bed that replicates the system's behavior, we introduce a precisely designed simulation environment. For our simulation environment, we opted for the Gazebo simulator, which is an open-source platform utilizing either Open Dynamic Engine (ODE) or Bullet physics. Integrating Gazebo with ROS can pave the way for efficient complex robotic applications due to the ability to simulate different environments involving multi-agent robots. Although some Computer-Aided Design (CAD-based) simulations of the Stewart platform exist, we choose ROS and Gazebo to benefit from the latest reinforcement learning algorithms with high yield and performance, compatible with the last developed RL frameworks. However, despite many robotic simulations in ROS, it lacks parallel applications and closed linkage structures like the Stewart platform. Consequently, our initial step involves creating a parametric representation of the Stewart platform's kinematics within the Gazebo and Robot Operating System (ROS) frameworks. This representation is then seamlessly integrated with a Python class to facilitate the generation of structures.
Multi-agent planning with automated curriculum learning

(Graduate School, 2025-06-11) Akgün, Onur ; Üre, Nazım Kemal ; 518182018 ; Mechatronics Engineering

Reinforcement learning (RL) represents a formidable paradigm for training autonomous agents to master sequential decision-making tasks. Its core principle, learning through trial and error guided by a reward signal, has proven successful in a variety of domains. However, the efficacy of standard RL algorithms diminishes drastically in environments characterized by sparse rewards or complex, high-dimensional state spaces. In these challenging settings, an agent receives meaningful feedback only after executing a long and specific sequence of correct actions. This "credit assignment problem" makes exploration, the process of discovering rewarding behaviors, profoundly inefficient. An agent may wander aimlessly without ever stumbling upon the feedback necessary to learn, preventing standard algorithms from developing effective policies. To overcome this fundamental limitation, this thesis turns to curriculum learning (CL), a strategy inspired by the principles of human pedagogy. Just as we teach students arithmetic before calculus, CL structures the learning process by initially presenting the agent with simpler tasks and gradually increasing the difficulty as its competence grows. This guided approach helps the agent build foundational skills that can be leveraged to solve more complex problems. The primary bottleneck of traditional CL, however, is its reliance on manual design; creating an effective curriculum requires significant human expertise, intuition, and domain-specific knowledge, making it a process that is both laborious and difficult to generalize. This thesis addresses this critical gap by proposing a novel framework for the automated and adaptive generation of learning curricula. The central objective was to develop, implement, and rigorously evaluate an algorithmic framework, termed Bayesian Curriculum Generation (BCG), designed to dynamically construct and adapt a curriculum based on an understanding of the task's underlying structure and the agent's real-time progress. The aim is to significantly enhance the performance, stability, and sample efficiency of RL agents, particularly in complex, sparse-reward scenarios where traditional methods falter. The proposed BCG algorithm is built upon a synergistic integration of several key concepts. At its heart, the framework utilizes Bayesian Networks (BNs), a type of probabilistic graphical model, to represent the structural dependencies among the key parameters that define the tasks within an environment. For instance, in a navigation task, these parameters might include map size, the number of obstacles, or the presence of adversaries. The BN captures the probabilistic relationships between these parameters, serving as a powerful generative model. This allows the framework to sample a diverse yet coherent set of task configurations, moving beyond simple parameter randomization to generate tasks with a principled structure. A critical component of the framework is its ability to handle diverse input modalities through flexible task representation techniques. For visual environments like MiniGrid, where the state is an image, a convolutional autoencoder (CAE) is trained to compress high-dimensional observations into a low-dimensional latent feature vector. This vector captures the essential semantic content of the state, providing a compact and meaningful representation for analysis. For environments defined by a set of scalar parameters, such as the physics-based AeroRival simulator, normalized parameter vectors are used directly. Once tasks are represented in a common feature space, their difficulty is quantified. This is typically achieved by measuring the distance (e.g., Euclidean distance) between a given task's representation and that of the final target task. The intuition is that tasks with representations closer to the target are more similar in the skills they require. These raw distance values are then normalized and processed using unsupervised clustering algorithms, such as K-Means, to automatically group tasks into a discrete number of difficulty levels or "bins." This process effectively creates the structured stages of the curriculum. A defining feature of BCG is its adaptability. The curriculum is not a static, predefined sequence. Instead, the selection of tasks for the agent to train on is performed probabilistically, guided by the agent's real-time performance metrics, such as its average reward or task success rate. If an agent consistently succeeds at a certain difficulty level, the probability of sampling tasks from the next, more challenging level increases. Conversely, if the agent struggles, the framework can present it with easier tasks to help it consolidate its skills. This closed-loop system ensures the agent is always training at the edge of its capabilities, preventing both stagnation and frustration. Crucially, the BCG framework implicitly and effectively leverages transfer learning to accelerate skill acquisition. The policy and value function parameters, learned by the base RL agent (in our evaluations, Proximal Policy Optimization - PPO) on tasks from one curriculum stage, are used to initialize the learning process for the subsequent, more challenging stage. This prevents the agent from having to learn from scratch at each step, allowing it to build upon previously acquired knowledge and dramatically speeding up convergence to an optimal policy for the final task. The practical efficacy and robustness of the BCG framework were empirically validated through comprehensive experiments in two distinct and demanding RL environments. The first, MiniGrid (specifically, the DoorKey variant), provided a discrete, grid-based navigation challenge characterized by partial observability (the agent can only see a small portion of its surroundings) and a hierarchically sparse reward (the agent must first find a key, then navigate to a door, and only then receive a reward). The second, AeroRival Pursuit, offered a continuous control task involving high-speed adversarial interaction, dynamic hazard avoidance, and sparse rewards, simulating an aerial combat scenario. In both testbeds, BCG's performance was rigorously benchmarked against a baseline PPO agent (with no curriculum) and a diverse set of relevant contemporary algorithms designed to address similar challenges. The experimental results consistently and unequivocally demonstrated the superiority of the BCG approach. Across both the discrete MiniGrid and continuous AeroRival environments, agents trained with BCG achieved significantly higher final performance levels and converged on successful policies more reliably than all tested baselines. Furthermore, BCG exhibited greater learning stability, as evidenced by a lower variance in performance across multiple independent training runs, indicating that its success is not due to random chance. Notably, in MiniGrid, BCG enabled the agent to master tasks of progressively increasing complexity where many baselines failed to scale. In the highly complex AeroRival environment, BCG was the only method that consistently enabled the agent to learn a successful policy, whereas most baselines failed to obtain any positive rewards at all. This success across environments with fundamentally different dynamics underscores the versatility and generality of the framework. In conclusion, this research makes a significant contribution to the field of reinforcement learning by developing, implementing, and validating the Bayesian Curriculum Generation algorithm. BCG presents a robust, principled, and effective solution for automated and adaptive curriculum learning, particularly in challenging domains hampered by sparse rewards and complex state spaces. By synergistically combining probabilistic modeling of the task space, adaptive task selection driven by agent performance, and efficient knowledge transfer between stages, BCG successfully guides exploration and accelerates the acquisition of complex skills. While acknowledging certain limitations—such as the initial need for domain knowledge to identify parameters for the BN and the added computational overhead—the presented results are highly promising. Future work will focus on automating parameter selection, extending the framework to non-stationary environments, and further improving computational efficiency. Ultimately, BCG offers a powerful approach that advances the potential for training more capable, efficient, and autonomous AI agents in the complex scenarios of tomorrow.

Gözat

Konu "artificial intelligence" ile LEE- Mekatronik Mühendisliği-Doktora'a göz atma

Sayfa başına sonuç

Sıralama Seçenekleri