Analysis of meta-gradient incentive algorithm for cooperative behavior in social dilemma problems

Vanlıoğlu, Abdullah

Analysis of meta-gradient incentive algorithm for cooperative behavior in social dilemma problems

dc.contributor.advisor	Üre, Nazım Kemal
dc.contributor.author	Vanlıoğlu, Abdullah
dc.contributor.authorID	514191030
dc.contributor.department	Defence Technologies Program
dc.date.accessioned	2025-07-14T08:51:26Z
dc.date.available	2025-07-14T08:51:26Z
dc.date.issued	2022
dc.description	Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2022
dc.description.abstract	In shared environments, agents are expected to act cooperatively to maximize rewards and achieve objectives. However, it remains as a challenge and an open research problem for self-interested agents to behave cooperatively in Multi-Agent Deep Reinforcement Learning (MARL) environments. Initially, research into multi-agent reinforcement learning focused on developing cooperative policies. However, this requires agents to share their policies with each other, which is sometimes not feasible. An alternative approach involves centralized learning, which relies on having detailed knowledge of agents and their environments. We applied Multi-Agent Proxy Proximal Policy Optimization (MAPPO), a centralized learning method, to investigate the behavior of centralized agents in a custom environment. The environment's objective is to eliminate hostile forces as quickly as feasible. Agents need to collaborate in order to reach the desired outcome. During such military tasks, agents may make strategic decisions or have conflicting objectives. This results in social dilemmas. Rewards and penalties can be utilized to incentivize cooperation when dealing with sequential social dilemmas (SSDs). These incentives can assist agents in learning to cooperate by rewarding them for actions that lead to cooperative outcomes. Learning to Incentivize Others (LIO) is a reward-shaping approach, which uses incentive rewards to encourage cooperation between agents. We analyze the robustness of LIO in the public good game Cleanup under different configurations. Our goal is to identify the sensitive points of LIO and provide insights to enhance meta-gradient based incentive learning. Our primary contribution is to carry out a comprehensive analysis to pinpoint the areas that most require improvement.
dc.description.degree	M.Sc.
dc.identifier.uri	http://hdl.handle.net/11527/27571
dc.language.iso	en
dc.publisher	Graduate School
dc.sdg.type	Goal 8: Decent Work and Economic Growth
dc.subject	shared environments
dc.subject	self-interested agents
dc.subject	Multi-Agent Deep Reinforcement Learning (MARL)
dc.title	Analysis of meta-gradient incentive algorithm for cooperative behavior in social dilemma problems
dc.title.alternative	Sosyal ikilem problemlerinde işbirlikçi davranış için meta-gradient teşvik algoritması analizi
dc.type	Master Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 773984.pdf
Boyut:: 2 MB
Format:: Adobe Portable Document Format
Açıklama

İndir

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: license.txt
Boyut:: 1.58 KB
Format:: Item-specific license agreed upon to submission
Açıklama

İndir

Koleksiyonlar

LEE- Savunma Teknolojileri-Yüksek Lisans