Generative models for game character generation

Emekligil Aydın, Ferda Gül
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Graduate School
Generating visual content and character design for games is generally a time-consuming process and is carried out by designers. The design process can be both costly and time-consuming for small businesses and independent developers. Working in this field requires a detailed understanding of visual aesthetics, creativity, and technical skills. It is important for the characters and visual content used in games to be compatible with the game's story, atmosphere, and gameplay. Designers and artists work to create original visual content and characters that align with the game's objectives and target audience, considering these requirements. Due to these reasons, content creation for games is a challenging process. Automating the design process helps to save time and budget. Many game companies and developers use procedural methods to automate the design process. Procedural content generation involves automatically generating game content using algorithms and rules. This approach offers significant advantages in generating repetitive content and enables developers and designers to create content faster. However, the visual content generated by these algorithms may have certain limitations in terms of diversity. With the advancement of technology and the progress of deep learning methods, approaches incorporating deep learning models have also started to be used instead of procedural methods. Examples of such methods include Generative Adversarial Networks (GANs) and Latent Diffusion models. In addition, in the studies presented in the thesis, the transfer learning method has been used in conjunction with generative models, and its success has been evaluated compared to these methods. In order to perform machine learning, a large amount of labeled data is typically required. However, it is not always possible to have access to a large labeled dataset, and obtaining and labeling data can be costly and time-consuming. The transfer learning method has been proposed to reduce or eliminate this requirement. When applying transfer learning, a pre-trained machine learning model is selected, which has been trained on a significant amount of labeled data. This model is a deep neural network that has learned general features from a large amount of labeled data. For example, a pre-trained classifier model trained on a popular dataset like ImageNet can be used. The initial layers of the selected model contain useful information about learned general features, while the top layers are not applicable to the target task. Therefore, some or all of the layers of the pre-trained model can be frozen, and only specific layers (usually the classification layers) can be retrained on the target dataset. This way, a much more successful model that is tailored to the target task's dataset is obtained. Transfer learning can be used in situations where the dataset is small, just like we have tried. Since a pre-trained model is used, the training process is much faster compared to training from scratch methods. Pre-trained models have learned generalized features from datasets that contain a wide variety and a large number of examples. This gives the models more generalizability, making them applicable to a wider range of domains. In summary, the transfer learning method involves transferring knowledge gained from previous experiences to a new task. It provides benefits in terms of speed, reduced need for labeled data, and improved model performance. Pretrained models trained on diverse and large datasets are used to apply this method effectively. Generative Adversarial Networks (GANs) can generate highly successful results for image generation and are also used in game character generation. GANs are composed of two distinct deep learning models: they are the Generator and the Discriminator. The primary role of the Generator network is to generate synthetic images, while the Discriminator network determines whether the generated images are real or fake. These two artificial neural networks compete with each other during the training phase. The Generator tries to deceive the Discriminator by generating images that are close to reality, while the Discriminator tries to identify the images generated by the Generator accurately. Feedback obtained at each iteration is used for training purposes. Latent Diffusion modeling method is a deep learning approach that involves generating synthetic data, denoising, and noise estimation. This method is based on capturing the temporal evolution of data points. It creates a latent space network in the training data, and through this distribution, it iteratively performs noise estimation and noise removal operations, allowing for the generation of synthetic, high-resolution, and impressive images. The U-Net architecture is used for the denoising model. To achieve this, word embeddings are utilized. Word embeddings are fed as input to all layers of the U-Net. The complexity of the U-Net model increases as the size of the input image increases, necessitating dimensionality reduction. Variational Autoencoders (VAEs) are used to reduce the dimensionality of the input image. By iteratively generating the latent vector, high-resolution images can be obtained. Latent Diffusion models can capture more complex data distributions and achieve more realistic and successful results that align with the real world. However, compared to other generative models, the training process of Latent Diffusion is much more time-consuming and challenging, and it also has higher computational costs. As a result, its implementation can be more demanding and resource-intensive. In this thesis, visual content generation for games is addressed in two different studies. In the first study, six different GAN models were trained using visual image datasets of two different RPG and DND characters. In 3 out of 18 experiments, transfer learning methods were used due to the small size of the datasets. The Frechet Inception Distance (FID) metric was used to compare the models. The results showed that SNGAN was the most successful in both datasets. Additionally, it was concluded that transfer learning methods (WGAN-GP, BigGAN) outperformed the training from scratch approach. In the second study presented in the thesis, a different dataset containing images of 2 different animals and fruits was used. Stylegan, and Latent Diffusion methods were employed. In the training of StyleGAN, eight types of fruit images and three types of animal images were used as conditioning inputs, and conditional learning was applied. In the Latent Diffusion method, the datasets were labeled with descriptive sentences about the images and fed into the model. FID scores were calculated for the generated outputs, and these outputs were transformed into a web game and played by 164 players. The results showed that the Latent Diffusion model performed well in the animal dataset according to the FID score, while StyleGAN performed well in the fruit dataset. In terms of the overall evaluation, the Latent Diffusion method yielded better results. According to the scores obtained from the players, the Latent Regression method also achieved better overall rankings. This indicates the consistency between the results obtained from the FID score and the player evaluation. Both studies demonstrate the feasibility of generating game characters or synthetic artistic visuals using deep neural networks and have produced consistent and continuous results.
Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023
Anahtar kelimeler
computer games, bilgisayar oyunları, computer programming, bilgisayar programlama, computer vision, bilgisayarla görme, game theory, oyun teorisi