Codebook learning: Challenges and applications in image representation learning

dc.contributor.advisor Ünal, Gözde
dc.contributor.author Can Baykal, Gülçin
dc.contributor.authorID 504202505
dc.contributor.department Computer Engineering
dc.date.accessioned 2025-03-28T07:19:40Z
dc.date.available 2025-03-28T07:19:40Z
dc.date.issued 2024-12-27
dc.description Thesis (Ph.D.) -- Istanbul Technical University, Graduate School, 2024
dc.description.abstract The rapid advancement of Machine Learning (ML) and Artificial Intelligence (AI) has paved the way for novel approaches in image representation learning for Computer Vision (CV), particularly through the utilization of codebook learning techniques. A codebook consists of representative vectors, also known as codewords, embeddings, or prototypes based on the context, that capture the essential features of the data. Codebook learning involves training these discrete representations within models, allowing the mapping of continuous data into a set of quantized or discrete vectors. This thesis studies codebook learning in two different contexts: the exploration of its challenges and the exploitation of the learned codebook in various tasks, including image generation and disentanglement. By examining three key studies, this thesis aims to provide a comprehensive understanding of how the challenges of codebook learning can be mitigated and how the learned codebook can be leveraged to enhance various image representation learning tasks. Codebook learning is beneficial in various applications, including image generation and classification tasks. It can be integrated into models like discrete Variational Autoencoders (VAEs), where it allows for efficient encoding and decoding of information, thereby improving performance in generative tasks. Additionally, in prototype based classification, codebooks consist of prototypes that characterize distinct classes within a dataset, enabling more accurate predictions. The versatility of codebook learning across different frameworks underscores its significance in advancing techniques for representation learning. The studies in this thesis perform codebook learning within different frameworks, and focus on the challenges of codebook learning along with the codebook incorporation to solve the significant problems of different image representation learning tasks. The first study addresses the challenge of codebook collapse where the codebook learning is performed within a discrete VAE framework. This phenomenon occurs when the learned codebook fails to capture the diversity of the input data as the multiple inputs get mapped to a limited number of codewords, leading to redundancy and a loss of representational power. This issue particularly arises in models such as Vector Quantized Variational Autoencoders (VQ-VAEs) and discrete VAEs, which rely on discrete representations for effective learning. The proposed solution involves a hierarchical Bayesian modeling to mitigate the codebook collapse. This work contributes significantly to the field by providing empirical evidence and theoretical insights into the root cause of codebook collapse, overcoming this collapse, thereby enhancing the representational power of discrete VAEs. After the first study that focuses on exploring the challenges of codebook learning within a VAE framework, the second and the third work focus on the problems of various image representation learning tasks where codebook learning can be exploited. In the second study, the focus shifts to the computational time problem of deep generative models, especially diffusion models. Diffusion models require relatively longer times for convergence, and our hypothesis is that incorporating informative signals about the data during the training of diffusion model might reduce the convergence time. However, the critical thing to manage is obtaining these informative signals in negligibly short time so that reducing the training time of the diffusion model also reduces the overall computational time. To learn such informative signals, we perform codebook learning within a framework of training a classifier, and the learned codebook consists of prototypes that represent the classes in the data. The second study in this thesis shows that using the class prototypes that are learned in a short time as the informative signals during the training of the diffusion model leads to better generative performance in the early stages of training, and eliminate the need for longer training. The third study's motivation is to overcome another important representation learning problem called disentanglement—a key aspect in understanding and representing complex data structures. Disentanglement refers to the ability to separate and manipulate the underlying factors of variation in the data, which is crucial for tasks such as attribute manipulation and controlled generation. On the grounds of the categorical nature of the underlying generative factors, our hypothesis is that using discrete representations that are well suited for the categorical data might aid disentanglement in the image representation. Therefore, we build a novel framework to learn a codebook within the framework of discrete VAEs, and propose an original optimization based regularization to further assist the disentanglement. The findings of this study demonstrate that using discrete representations and optimization based regularizers leads to significant improvements in terms of disentanglement. This research emphasizes the synergy between codebook learning and disentanglement, advocating for further exploration of their combined potential in advancing image representation learning. The exploration of these three studies reveals the critical challenges and advantages associated with codebook learning. The first study lays the groundwork by addressing the fundamental issue of codebook collapse, while the subsequent studies demonstrate the applicability of codebook learning in diverse contexts such as image generation and disentanglement. Together, these works illustrate that a robust understanding of codebook learning can lead to significant advancements in image generation and disentanglement. In summary, this thesis contributes to the growing literature on codebook learning by providing a detailed overview that includes its challenges and applications. The findings highlight the importance of addressing inherent challenges while leveraging the benefits of codebook learning for practical applications. Insights gained from this research aim not only to enhance the performance of existing models but also to inspire future innovations in image representation learning.
dc.description.degree Ph. D.
dc.identifier.uri http://hdl.handle.net/11527/26699
dc.language.iso en_US
dc.publisher Graduate School
dc.sdg.type Goal 3: Good Health and Well-being
dc.sdg.type Goal 9: Industry, Innovation and Infrastructure
dc.sdg.type Goal 11: Sustainable Cities and Communities
dc.subject Computer vision
dc.subject Bilgisayarla görme
dc.subject Deep learning
dc.subject Derin öğrenme
dc.subject Machine learning
dc.subject Makine öğrenmesi
dc.subject Artificial intelligence
dc.subject Yapay zeka
dc.title Codebook learning: Challenges and applications in image representation learning
dc.title.alternative Kod kitabı öğrenimi: Görüntü temsili öğrenimindeki zorluklar ve uygulamaları
dc.type Doctoral Thesis
Dosyalar
Orijinal seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.alt
Ad:
504202505.pdf
Boyut:
39.17 MB
Format:
Adobe Portable Document Format
Açıklama
Lisanslı seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.placeholder
Ad:
license.txt
Boyut:
1.58 KB
Format:
Item-specific license agreed upon to submission
Açıklama