Evaluating performance of large language models in bluff-based card games: A comparative study
Evaluating performance of large language models in bluff-based card games: A comparative study
Dosyalar
Tarih
2025-06-27
Yazarlar
Şalk, İrem
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
The aim of this study is to investigate and compare the decision-making performance of multiple agent strategies in a modified bluff-based card game under imperfect information. The study is focused on whether a large language model (LLM) that is prompted through in-context learning (ICL), can generate effective action recommendations when comparisons are made to other agents that use traditional rule-based strategies and reinforcement learning in a game where deception plays a critical role. The central hypothesis is that successful bluff or challenge actions can be adaptively recommended by reasoning-driven agents using LLM by interpreting structured sets of information despite having no training on game-specific reward signals. Our research question is "Can LLM-driven strategy suggest optimal actions by making predictions and inferences about opponents in a bluff-based card game?" This study is expected to contribute to growing research on the application of LLMs in real-time decision-making domains by analyzing the performance of the agents in terms of the success rate of actions and wins. To do that, the model's ability to reason over game states, player histories, and probabilistic cues to select actions in a bluff-based setting is focused on in the evaluation. The game environment is a modified version of the traditional Bluff (also known as Cheat or BS) card game. The modified version of the Bluff game uses a special 24-card deck which consists of 6 suits and 4 cards and is played by three players (agents). At the start of the game, cards are evenly distributed among all players, the first player is chosen randomly and turns continue in a clockwise direction in each game. Cards are kept hidden by players from others during the game. A fixed table rank is used throughout the game. During the gameplay, two actions can be performed by players: challenge and move. When a move is made by players, cards are piled face down in the middle and a certain amount of cards with the table rank and rank value is claimed to be played. This feature allows opponents to be bluffed by players. A bluff move is defined as one in which at least one card does not match the declared rank, while a truthful move is one in which all cards match the declared rank. To simplify the action selection, the challenge phase is limited to one player, as the challenge action may only be performed by the next-moving player. Therefore, the previous move may be challenged by a player if it is believed that it does not match the required rank. If the last played move is a bluff, then all cards in the pile are taken by the previous player; otherwise, all cards in the pile are taken by the player who performs the challenge. The first player who discards all the cards in their hand is defined as the winner of the game. The game is proceeded in turns until one player wins. To create a game simulation framework, the gameplay and rules are first modeled mathematically by using set notation based on our game design. Then, the frequency of action selections along with their corresponding outcomes is derived, and dynamic reward-penalty for each action and game conclusion is defined. Next, the game is modeled which is played by five different agents. Actions are selected by each agent based on its internal strategy logic, and rewards are distributed based on the success of bluffs and challenges. In this methodology, five distinct agents are implemented: Random Agent: This agent selects actions uniformly at random without considering game context. Serves as a performance baseline. State Dependent Agent: This agent uses handcrafted rules based on the current state, such as pile size, card distribution. The agent with minimal logic and contextual awareness as calculating the probability of occurrence of the matching cards in the opponent's hand is modeled. Bayesian Agent: Employs hierarchical priors from historical game data and current game state to evaluate best action for bluffing and challenging. Adapts action preferences based on prior success rates. DQN Agent: A deep reinforcement learning model trained to maximize long-term reward. It maps observed states to derive optimal actions using Q-learning, updating its policy across turns in episodes. To support reinforcement learning, we defined both episode-level rewards (e.g., winning or losing the game) and turn-based rewards for each action type (move and challenge) to guide the learning process of the agent effectively. We design two different learning configurations as Baseline-Oriented Training which is competing against baseline agents to learn stable behavior in known scenarios and self-play training which provides learning by playing against versions of itself to promote generalization and adaptability. LLM Agent: A GPT-based language model (GPT-4o) receives a structured prompt describing the rules and instructions, current state, past behaviors, and probabilistic game summaries such as winning rate. In addition, a simple chain of thoughts by instructions and strategy guidelines to analyze opponent behaviors, predict possible outcomes of possible action scenarios and evaluate risk-reward to select an action is design and implemented. It reasons step-by-step internally but outputs only the final action. To ensure fair evaluation of the LLM agent, an opponent filtering mechanism is implemented, where the LLM is only tested against a selected pool of agents with varied but stable behaviors. This prevents high variance due to opponent unpredictability and allows consistent measurement of reasoning-based performance. The conducted simulation results are analyzed by strategy type, focusing on bluff/challenge success, consistency, and win rate. Action success rates and cumulative performance are included in the metrics. The analysis is conducted by comparing static (Random, State-dependent, Bayesian), learning-based (DQN), and reasoning-based (LLM) agents to determine how performance in uncertain, deceptive environments is influenced by adaptive decision-making. By comparing action decisions of the LLM agent with those of the other strategies, the effectiveness of LLM-driven recommendations is evaluated in the study. The simulation framework is contributed to for understanding how LLMs perform in strategic decision-making tasks where uncertainty and deception are key components.
Açıklama
Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2025
Anahtar kelimeler
language models,
dil modelleri,
digital games,
sayısal oyunlar,
artificial intelligence,
yapay zeka