LEE- Savunma Teknolojileri-Yüksek Lisans
Bu koleksiyon için kalıcı URI
Gözat
Yazar "Başpınar, Barış" ile LEE- Savunma Teknolojileri-Yüksek Lisans'a göz atma
Sayfa başına sonuç
Sıralama Seçenekleri
-
ÖgeEvolutionary reinforcement learning based autonomous maneuver decision in one-to-one short-range air combat(Graduate School, 2023-07-28) Baykal, Yasin ; Başpınar, Barış ; 514201076 ; Defence TechnologiesAir combat, particularly one-to-one short-range air combat, presents a challenging and dynamic environment where aircraft agility plays a crucial role. Although the use of unmanned aerial vehicles (UAVs) in air combat has been increasing rapidly, there are many challenges limiting the capability of UAVs against manned aircraft. One of them is the difficulty of making autonomous maneuver decisions, which can currently be partially overcome through remote control by expert pilots. However, the ultimate goal is to enable UAVs to make independent maneuvering decisions by evaluating their situation, analyzing opponent capabilities, and devising combat strategies accordingly. To address this challenge, the use of reinforcement learning based methods has increased considerably in current studies. This thesis aims to develop an evolutionary reinforcement learning-based autonomous maneuver decision system specifically designed for one-on-one short-range air combat. The initial phase involves investigating existing autonomous maneuver decision systems that rely on reinforcement learning. Reinforcement learning is a method in which a machine or artificial intelligence system called an agent interacts with its environment and performs the learning process through experiences and feedback. The agent observes the environment, takes actions based on its assessment, and adjusts its strategy by receiving rewards or punishments for its actions as feedback. Due to the trial-and-error nature of reinforcement learning, a simulation environment is necessary as real aircraft cannot be used for this purpose. The simulation environment in this research consists of four main modules. The first module is flight dynamics, which includes simplified aircraft kinematics and position equations suitable for the scope of the thesis. The second module is the maneuver decision maker containing the agent responsible for observing the aircraft's environment and generating appropriate actions. Rule-based agents also exist within this module. The third module is a visualization module that displays the maneuvers performed by the aircraft in aerial combat scenarios. Finally, the observer module facilitates communication between the other modules, assigning starting points to the aircraft and relaying information. At the beginning of the study, there were no reinforcement learning models capable of producing desirable maneuvers for aircraft. Consequently, an enemy agent is needed to train the allied aircraft. A rule-based agent performing random maneuvers to escape from the allied aircraft is used for this purpose. As the training progresses, the allied agent learns to defeat the opponent. Test results indicate that the allied agent achieves a ninety percent win rate over the enemy agent. Existing studies in the literature replace the enemy agent with the trained allied agent as training progresses. By doing that, the allied agent is trained against an intelligent opponent obtained from earlier stages of training. However, this method presents a notable drawback. For instance, the second agent, capable of beating the first agent with a high success rate, fails against the randomly moving agent. At the same time, the first agent is the agent that defeats the random-acting agent with a high win rate. Thus, the second agent loses its superiority against opponents encountered prior to its current enemy. An evolutionary reinforcement learning-based autonomous maneuver decision system is proposed to overcome this issue. The proposed approach aims to improve the UAVs' autonomous maneuver decision process and generate a robust policy against alternative enemy strategies. The training process involves parallel training of multiple workers, evaluation of models at regular intervals, selection of the best model, testing the best model against enemy policies, and updating the pool of enemy strategies. It is observed that the models trained with the proposed training method constantly improve, and the approach leads to more robust policies. The results show that the proposed method generates better policies with higher win rates than the agents trained via standard RL technics or k-level learning approach.