Multi agent planning under uncertainty using deep Q-networks

thumbnail.default.alt
Tarih
2024-04-29
Yazarlar
Tarhan, Farabi Ahmed
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
The extensive popularity of commercial unmanned aerial vehicles has drawn great attention from the e-commerce industry due to their suitability for last-mile delivery. However, the organization of multiple aerial vehicles efficiently to deliver the given set of goods within the existence of no-fly zones, numerous warehouses, limited fuel, and uncertainties are still a problem for traditional algorithms. The main challenge of planning is scalability, since the planning space grows exponentially with respect to the number of agents, and it is not efficient to let human-level supervisors structure the problem for such large-scale settings. With the recent advancements in deep reinforcement learning, algorithms such as Deep Q Networks (DQN), had unprecedented success in solving single-agent decision-making problems. Extension of these algorithms to multi-agent problems such as multi-drone delivery is very limited due to scalability issues. This work proposes an approach that improves the performance of DQN on multi-agent drone delivery problems by utilizing state decompositions for lowering the problem complexity, curriculum learning for handling the exploration complexity of delivery environments, and genetic algorithms (GA) for searching efficient packet-drone matching across the combinatorial solution space. The performance of the proposed method is shown in a multi-agent delivery by drone problem that has $10$ agents and $\approx10^{77}$ state-action pairs. Comparative simulation results are provided to demonstrate the merit of the proposed method. Compared with the conventional DQN schemes, and recently developed utility decomposition techniques, the proposed genetic algorithm-aided multi-agent DRL outperformed the rest in terms of scalability and convergent behavior. The prior techniques become intractable quickly at a large number of agents within the context of delivery by drone problem. The basic DQN algorithm fails to find a solution for three agents in a 10x10 drone delivery scenario within a reasonable number of steps, but the deep correction method successfully converges after approximately 1 million Bellman updates. Furthermore, applying the deep correction method increases the learning capacity to five agents and converges around 35 million Bellman updates. However, using this method does not lead to convergence with ten agents in a manageable way. With powerful computing resources, it becomes clear that while single-agent models set an initial computational standard, increasing the number of agents introduces complexity, as seen through immediate convergence difficulties in a three-agent DQN setup. Although there is promise with three- and five-agent configurations using Deep Correction, the model with ten agents exceeds the threshold for convergence within 24 hours, emphasizing the delicate balance between agent quantity and computational feasibility. The utilization of drone delivery simulation presents intricate challenges, including restricted airspace, fuel limitations, and the pick-and-place scenario. The study demonstrates that employing a method involving packet distribution through genetic algorithms effectively minimizes the complexity in resolving tasks for 10 agents within 5.74 minutes. Subsequently, the reduced problem is handled by deep Q-network inference models with Curriculum Learning and Prioritized Experience Replay, achieving execution times measured in milliseconds. This two-fold approach skillfully learns the dynamic nature of delivery problems without requiring prior domain knowledge input amid uncertain environmental conditions prone to altering actions. Furthermore, visual evidence at various time steps during execution illustrates how integrating GA-based packet distribution empowers the proposed base DQN model with Curriculum Learning and PER framework to tackle scenarios involving 10 agents – an accomplishment deemed unattainable by other explored solutions within reasonable time frames and computational resources. In conclusion, the combination of deep reinforcement learning and genetic algorithms provides a promising approach for efficient and effective delivery with multi-agent drones under uncertainty.
Açıklama
Thesis (Ph.D.) -- Istanbul Technical University, Graduate School, 2024
Anahtar kelimeler
Genetic algorithms, Genetik algoritmalar, Reinforced learning, Takviyeli öğrenme
Alıntı