LEE- Bilgisayar Mühendisliği-Doktora

Bu koleksiyon için kalıcı URI

http://hdl.handle.net/11527/19208

Gözat

Codebook learning: Challenges and applications in image representation learning

(Graduate School, 2024-12-27) Can Baykal, Gülçin ; Ünal, Gözde ; 504202505 ; Computer Engineering

The rapid advancement of Machine Learning (ML) and Artificial Intelligence (AI) has paved the way for novel approaches in image representation learning for Computer Vision (CV), particularly through the utilization of codebook learning techniques. A codebook consists of representative vectors, also known as codewords, embeddings, or prototypes based on the context, that capture the essential features of the data. Codebook learning involves training these discrete representations within models, allowing the mapping of continuous data into a set of quantized or discrete vectors. This thesis studies codebook learning in two different contexts: the exploration of its challenges and the exploitation of the learned codebook in various tasks, including image generation and disentanglement. By examining three key studies, this thesis aims to provide a comprehensive understanding of how the challenges of codebook learning can be mitigated and how the learned codebook can be leveraged to enhance various image representation learning tasks. Codebook learning is beneficial in various applications, including image generation and classification tasks. It can be integrated into models like discrete Variational Autoencoders (VAEs), where it allows for efficient encoding and decoding of information, thereby improving performance in generative tasks. Additionally, in prototype based classification, codebooks consist of prototypes that characterize distinct classes within a dataset, enabling more accurate predictions. The versatility of codebook learning across different frameworks underscores its significance in advancing techniques for representation learning. The studies in this thesis perform codebook learning within different frameworks, and focus on the challenges of codebook learning along with the codebook incorporation to solve the significant problems of different image representation learning tasks. The first study addresses the challenge of codebook collapse where the codebook learning is performed within a discrete VAE framework. This phenomenon occurs when the learned codebook fails to capture the diversity of the input data as the multiple inputs get mapped to a limited number of codewords, leading to redundancy and a loss of representational power. This issue particularly arises in models such as Vector Quantized Variational Autoencoders (VQ-VAEs) and discrete VAEs, which rely on discrete representations for effective learning. The proposed solution involves a hierarchical Bayesian modeling to mitigate the codebook collapse. This work contributes significantly to the field by providing empirical evidence and theoretical insights into the root cause of codebook collapse, overcoming this collapse, thereby enhancing the representational power of discrete VAEs. After the first study that focuses on exploring the challenges of codebook learning within a VAE framework, the second and the third work focus on the problems of various image representation learning tasks where codebook learning can be exploited. In the second study, the focus shifts to the computational time problem of deep generative models, especially diffusion models. Diffusion models require relatively longer times for convergence, and our hypothesis is that incorporating informative signals about the data during the training of diffusion model might reduce the convergence time. However, the critical thing to manage is obtaining these informative signals in negligibly short time so that reducing the training time of the diffusion model also reduces the overall computational time. To learn such informative signals, we perform codebook learning within a framework of training a classifier, and the learned codebook consists of prototypes that represent the classes in the data. The second study in this thesis shows that using the class prototypes that are learned in a short time as the informative signals during the training of the diffusion model leads to better generative performance in the early stages of training, and eliminate the need for longer training. The third study's motivation is to overcome another important representation learning problem called disentanglement—a key aspect in understanding and representing complex data structures. Disentanglement refers to the ability to separate and manipulate the underlying factors of variation in the data, which is crucial for tasks such as attribute manipulation and controlled generation. On the grounds of the categorical nature of the underlying generative factors, our hypothesis is that using discrete representations that are well suited for the categorical data might aid disentanglement in the image representation. Therefore, we build a novel framework to learn a codebook within the framework of discrete VAEs, and propose an original optimization based regularization to further assist the disentanglement. The findings of this study demonstrate that using discrete representations and optimization based regularizers leads to significant improvements in terms of disentanglement. This research emphasizes the synergy between codebook learning and disentanglement, advocating for further exploration of their combined potential in advancing image representation learning. The exploration of these three studies reveals the critical challenges and advantages associated with codebook learning. The first study lays the groundwork by addressing the fundamental issue of codebook collapse, while the subsequent studies demonstrate the applicability of codebook learning in diverse contexts such as image generation and disentanglement. Together, these works illustrate that a robust understanding of codebook learning can lead to significant advancements in image generation and disentanglement. In summary, this thesis contributes to the growing literature on codebook learning by providing a detailed overview that includes its challenges and applications. The findings highlight the importance of addressing inherent challenges while leveraging the benefits of codebook learning for practical applications. Insights gained from this research aim not only to enhance the performance of existing models but also to inspire future innovations in image representation learning.
Identification of object manipulation anomalies for service robots

(Lisansüstü Eğitim Enstitüsü, 2021) Altan, Doğan ; Uzar Sarıel, Sanem ; 709912 ; Bilgisayar Mühendisliği

Recent advancements in artificial intelligence have resulted in an increase in the use of service robots in many domains. These domains include households, schools and factories to facilitate daily life in domestic tasks. Characteristics of such domains necessitate the intense interaction of robots with humans. These interactions necessitate extending the abilities of service robots to deal with safety and ethical issues. Since service robots are usually assigned to complex tasks, unexpected deviations of task state are highly probable. These deviations are called anomalies, and they need to be continually monitored and handled for robust execution. After an anomaly case is detected, it should be identified for effective recovery. For the identification task, a time series analysis of onboard sensor readings is needed since some anomaly indicators are observed long before the detection of the anomaly. These sensor readings need to be fused effectively for correct interpretations as they are generally taken asynchronously. In this thesis, the anomaly identification problem of everyday object manipulation scenarios is addressed. The problem is handled from two perspectives by considering the feature types that are processed. Two frameworks are investigated: the first one takes into account domain symbols as features while the second framework considers convolutional features. Chapter 5 presents the first framework to address this problem by analyzing symbols as features. It combines and fuses auditory, visual and proprioceptive sensory modalities with an early fusion method. Before they are fused, a visual modeling system generates visual predicates and provides them as inputs to the framework. Auditory data are fed into a support vector machine (SVM) based classifier to obtain distinct sound classes. Then, these data are fused and processed within a deep learning architecture. The architecture consists of an early fusion scheme, a long short-term memory (LSTM) block, a dense layer and a majority voting scheme. After the extracted features are fed into the designed architecture, the occurred anomaly is classified. Chapter 6 presents a convolutional three-stream anomaly identification (CLUE-AI) architecture that fuses visual, auditory and proprioceptive sensory modalities. Visual convolutional features are extracted with convolutional neural networks (CNNs) from raw 2D images gathered through an RGB-D camera. These visual features are then fed into an LSTM block with a self-attention mechanism. After attention values for each image in the gathered sequence are calculated, a dense layer outputs the attention-enabled results for the corresponding sequence. Mel frequency cepstral coefficients (MFCC) features are extracted from the auditory data gathered through a microphone in the auditory stage. This is followed by feeding these auditory features into a CNN block. The position of the gripper and the force applied by it are also fed into a designed CNN block. These resulting sensory modalities are then concatenated with a late fusion mechanism. Afterward, the resulting feature vector is fed into fully connected layers. Finally, the anomaly type is revealed. The experiments are conducted on real-world everyday object manipulation scenarios performed by a Baxter robot equipped with an RGB-D head camera on top and a microphone placed on the torso. Various investigations including comparative performance evaluations, parameter and multimodality analyses are studied to show the validity of the frameworks. The results indicate that the presented frameworks have the ability to identify anomalies with f-scores of 92% and 94%, respectively. As these results indicate, the CLUE-AI framework outperforms the other in classifying occurred anomaly types. Due to the requirements that the frameworks necessitate, the CLUE-AI framework does not require additional external modules such as a scene interpreter or a sound classifier as the other one does and provides better results compared to the symbol-based solution.
Software defect prediction with a personalization focus and challenges during deployment

(Lisansüstü Eğitim Enstitüsü, 2021) Eken, Beyza ; Kühn Tosun, Ayşe ; 723330 ; Bilgisayar Mühendisliği

Organizations apply software quality assurance techniques (SQA) to deliver high-quality products to their customers. Developing defect-free software holds a critical role in SQA activities. The increasing usage of software systems and also their rapidly evolving nature in terms of size and complexity raise the importance of effectiveness in defect detection activities. Software defect prediction (SDP) is a subfield of empirical software engineering that focuses on building automated and effective ways of detecting defects in software systems. Many SDP models have been proposed in two decades, and current state-of-the-art models mostly utilize artificial intelligence (AI) and machine learning (ML) techniques, and product, process, and people-related metrics which are collected from software repositories. So far now, the people aspect of the SDP has been studied less compared to the algorithm (i.e., ensembling or tuning machine learners) and data aspects (i.e., proposing new metrics). While the majority of people-focused studies incorporate developer or team related metrics into SDP models, recently personalized SDP models have been proposed. On the other hand, the majority of the SDP research so far now focuses on building SDP models that produce high rates of prediction performance values. Real case studies in industrial software projects and also the number of studies that research the applicability of SDP models in practice are relatively few. However, for an SPD solution to be successful and efficient, its applicability in real life is as important as its prediction accuracy. This thesis focus on two main goals: 1) assessing people factor in SDP to understand whether it helps to improve the prediction accuracy of SDP models, and 2) prototyping an SDP solution for an industrial setting and assessing its deployment performance. First, we made an empirical analysis to understand the effect of community smell patterns on the prediction of bug-prone software classes. The ''community smell'' term is recently coined to describe the collaboration and communication flaws in organizations. Our motivation in this part is based on the studies that show the success of incorporating community factors, i.e., sociotechnical network metrics, into prediction models to predict bug-prone software modules. Also, prior studies show the statistical association of community smells with code smells (which are code antipatterns) and report the predictive success of using code smell-related metrics in the SDP problem. We assess the contribution of community smells on the prediction of bug-prone classes against the contribution of other state-of-the-art metrics (e.g., static code metrics) and code smell metrics. Our analysis on ten open-source projects shows that community smells improve the prediction rates of baseline models by 3% in terms of area under the curve (AUC), while the code smell intensity metric improves the prediction rates by 17%. One reason for that is the existing ways of detecting community smell patterns may not be rich in terms of capturing communication patterns of the team since it only mines patterns through mailing archives of organizations. Another reason is that the technical code flaws (code smell intensity metric) are more successful in representing defect related information compared to community smells. Considering the challenging situation in extracting community patterns and the higher success of the code small intensity metric in SDP, we direct our research to focus on the code development skills of developers and the personalized SDP approach. Second, we investigate the personalized SDP models. The rationale behind the personalized SDP approach is that different developers tend to have different development patterns and consequently, their development may have different defect patterns. In the personalized approach, there is an SDP model for each developer in the team which is trained with the developer's own development history solely and its predictions target only the developer. Whereas in the traditional approach, there is a single SDP model that is trained with the whole team's development history, and its predictions target anyone in the team. Prior studies report promising results on the personalized SDP models. Still, their experimental setup is very limited in terms of data, context, model validation, and further explorations on the characteristics that affect the success of personalized models. We conduct a comprehensive investigation of personalized change-level SDP on 222 developers from six open-source projects utilizing two state-of-the-art ML algorithms and 13 process metrics collected from software code repositories that measure the development activity from size, history, diffusion, and experience aspects. We evaluate the model performance using rigorous validation setups, seven assessment criteria, and statistical tests. Our analysis shows that the personalized models (PM) predict defects better than general models (GM), i.e., increase recall by up to 24% for the 83% of developers. However, PM also increases the false alarms of GM by up to 12% for 77% of developers. Moreover, PM is superior to GM for those developers who contribute to the software modules that have been contributed by many prior developers. GM is superior to PM for the more experienced developers. Further, the information gained from various process metrics in prediction defects differs among individuals, but the size aspect is the most important one in the whole team. In the third part of the thesis, we build prototype personalized and general SDP models for our partner from the telecommunication industry. By using the same empirical setup that we use for the investigation of personalized models in open-source projects, we observe that GM detects more defects than PM (i.e., 29% higher recall) in our industrial case. However, PM gives 40% lower false alarms than GM, leading to a lower code inspection cost than GM. Moreover, we observe that utilizing multiple data sources such as semantic information extracted from commit descriptions and latent features of development activity and applying log filtering on metric values improve the recall of PM by up to 25% and lowers GM's false alarms by up to 32%. Considering the industrial team's perspective on prediction success criteria, we pick a model to deploy that produces balanced recall and false alarm rates: the GM model that utilizes the process and latent metrics and log filtering. Also, we observe that the semantic metrics extracted from the commit descriptions do not seem to contribute to the prediction of defects as much as process and latent metrics. In the fourth and last part of the thesis, we deploy the chosen SDP prototype into our industrial partner's real development environment and share our insights on the deployment. Integrating SDP models into real development environments has several challenges regarding performance validation, consistency, and data accuracy. The offline research setups may not be convenient to observe the performance of SDP models in real life since the online (real-life) data flow of software systems is different than offline setups. For example, in real life, discovering bug-inducing commits requires some time due to the bug life cycle, and this causes a data label noise in the training sets of an online setup. Whereas, an offline dataset does not have that problem since it utilizes a pre-collected batch dataset. Moreover, deployed SDP models need a re-training (update) with the recent commits to provide consistency in their prediction performance and to keep up with the non-stationary nature of the software. We propose an online prediction setup to investigate the deployed prototype's real-life performance under two parameters: 1) a train-test (TT) gap, which is a time gap between the train and test commits used to avoid learning from noisy data, and 2) model update period (UP) to include the recent data into the model learning process. Our empirical analysis shows that the offline performance of the SDP prototype reflects its online performance after the first year of the project. Also, the online prediction performance is significantly affected by the various TT gap and UP values, up to 37% and 18% in terms of recall, respectively. In deployment, we set the TT gap to 8-month and UP to 3-day, since those values are the most convenient ones according to the online evaluation results in terms of prediction capability and consistency over time. The thesis concludes that using the personalized SDP approach leads to promising results in predicting defects. However, whether PM should be chosen over GM depends on factors such as the ML algorithm used, the prediction performance assessment criteria of the organization, and developers' development characteristics. Future research in personalized SDP may focus on profiling developers in a transferable way instead of building a model for each software project. For example, collecting developer activity from public repositories to create a profile or using cross-project personalized models would be some options. Moreover, our industrial experience provides good insights regarding the challenges of applying SDP in an industrial context, from data collection to model deployment. Practitioners should consider using online prediction setups and conducting a domain analysis regarding the team's practices and prediction success criteria and project context (i.e., release cycle) before making deployment decisions to obtain good and consistent prediction performance. Interpretability and usability of models hold a crucial role in the future of SDP studies. More researchers are becoming interested in such aspects of SDP models, i.e., developer perceptions of SDP tools and actionability of prediction outputs.
Uçtan uca derin öğrenme yaklaşımlarıyla Türkçe eşgönderge çözümlemesi

(Lisansüstü Eğitim Enstitüsü, 2025-02-03) Arslan Pamay, Tuğba ; Eryiğit, Gülşen ; 504182513 ; Bilgisayar Mühendisligi

Eşgönderge Çözümlemesi (EÇ), bir doküman içinde yer alan, aynı gerçek dünya varlığının (ör. bir kişi, yer veya olay) temsili olan sözcükler (ifade) arasındaki göndergesel ilişkinin çözümlenmesidir. Doğal Dil İşleme (DDİ) alanının anlamsal katmanında önemli bir görev olarak yer alan EÇ, metnin bağlamını derinlemesine çözümleyerek, dokümanın doğru bir şekilde anlaşılmasına ve istenen bilgilerin doğru bir şekilde çıkarılmasına yardımcı olmaktadır. Bu görevde, aralarında ilişki çözümlemesi yapılacak sözcük veya sözcük öbekleri bir ifade olarak tanımlanır. Uçtan uca bir EÇ sistemi, iki aşamadan oluşur: 1) İfade Saptama, 2) İlişki Çözümleme. İfade saptama aşamasında, dokümandaki tüm göndergesel ifadeler tespit edilir. Sonrasında, bu ifadeler arasındaki ilişkiler çözümlenerek aynı gerçek dünya varlığını temsil eden ifadeler aynı ifade kümesi altında birleştirilir. Türkçe, biçim bilimsel açıdan oldukça zengin ve zamir düşürme özelliğine sahip bir dildir. Bu özellikleri, Türkçe metinlerde bazı zamirlerin metin içerisinde açıkça yer almamasına olanak tanımaktadır. Dolayısıyla, Türkçe için geliştirilen kapsamlı bir EÇ sisteminin, düşürülen zamirleri de birer ifade olarak ele alıp bu zamirlerin ilişki çözümlemesini yapması, Türkçe yazılmış bir metnin anlam bütünlüğünün doğru anlaşılabilmesi için son derece önemlidir. Düşen zamirlere ilişkin bilgiler, cümledeki başka bir sözcüğün biçim bilimsel katmanında yer almaktadır. Bu durum, sözcüklerin yalnızca orijinal formlarının değil, aynı zamanda biçim birim düzeyinde de incelenmesini zorunlu kılmaktadır; dolayısıyla, Türkçe EÇ problemi diğer dillere kıyasla daha karmaşık bir hale gelmektedir. EÇ literatüründe yer alan çalışmalar incelendiğinde, çalışmaların çoğunun İngilizce üzerinde gerçekleştirildiği görülmektedir. Dil bilimsel açıdan Türkçeye benzeyen diller üzerinde yapılan EÇ çalışmaların ise son yıllarda başladığı görülmektedir. Yukarıda belirtilen Türkçenin dil bilimsel yapısından kaynaklanan biçim birim düzeyinde eşgönderge çözümlemesi gerekliliği, İngilizce için geliştirilmiş sistemlerin Türkçe için doğrudan uygulanmasına olanak tanımamaktadır. Bu tez çalışmasının hedefi, Türkçenin dil bilimsel özelliklerini göz önünde bulunduran ve yapay sinir ağları yöntemlerinden faydalanan, uçtan uca ilk Türkçe EÇ modelini gerçekleştirmektir. Bu doğrultuda: 1) Türkçenin yapısı, düşürülen zamirler açısından incelenmiş ve bu bilgiler için EÇ görevine özgü bir etiketleme şeması önerilmiş ve düşürülmüş zamirlerin bu şema ile göndergesel ifadeler olarak etiketlendiği güncel bir Türkçe EÇ veri kümesi sunulmuş, 2) Derin öğrenme yöntemlerinden faydalanan, farklı EÇ yaklaşımları ile geliştirilmiş Türkçe EÇ modelleri geliştirilerek, modellerin başarımları karşılaştırılmış, 3) Önerilen Türkçe EÇ veri kümesinin, çok dilli EÇ çalışmalarında kullanılabilmesi için ilgili veri kümesi koleksiyonlarında yer almasına yönelik çalışmalar tamamlanmış, 4) Türkçeyi de kapsamına alan çok dilli EÇ modelleri geliştirilerek, modellerin başarımları karşılaştırılmış, 5) Sonuç olarak, kod çözücü mimarisine sahip büyük dil modellerinden faydalanan, talimatlı tabanlı eğitilen, çok dilli EÇ modellerinin Türkçe EÇ üzerinde en iyi performansı gösterdiği ortaya konmuştur. Ek olarak, çok dilli modeller üzerinde yapılan iyileştirmeler ile özellikle dil bilimsel açından Türkçeye benzeren başka dillerdeki EÇ performanslarında da artışlar gözlemlenmiştir. Tez çalışmasında, mevcut etiketli Türkçe EÇ veri kümesi iyileştirilmiş ve düşürülmüş zamirlerin göndergesel ilişkileri etiketlenerek literatürdeki en güncel Türkçe EÇ veri kümesi oluşturulmuştur. Türkçenin EÇ başarımına, farklı eşgönderge çözümlemesi yaklaşımlarıyla (ifade çifti, ifade sıralama, uçtan uca) geliştirilen yapay sinir ağları tabanlı modellerin etkisi incelenmiştir. Veri kümesinin kalitesi ve düşürülmüş zamir etiketlemelerinin Türkçe EÇ modellerinin başarısına etkisi araştırılmıştır. Ayrıca, derin öğrenme yöntemleriyle geliştirilen Türkçe EÇ modellerinde çizge sinir ağları katmanlarının kullanımı ve bunun performansa etkisi de incelenmiştir. Türkçe üzerinde eğitilen tek dilli modeller, çok dilli olarak genişletilerek diller arası transferin Türkçe EÇ başarımına etkisi değerlendirilmiştir. Bu aşamada, Türkçe ve diğer dillerdeki EÇ başarımlarının, dillerin birbirlerinden öğrendikleri bilgilerle nasıl etkilendiği incelenmiştir. Türkçenin biçim bilimsel zenginliği nedeniyle, dil bilimsel bilgilerin EÇ modellerinde öznitelik olarak kullanılmasının etkisi, Türkçe ve benzer dillerdeki çok dilli EÇ veri kümesi üzerinde araştırılmıştır. Son olarak, kod çözücü mimarisi ve talimat tabanlı yöntemle geliştirilen çok dilli EÇ modelinin Türkçe ve diğer dillerdeki başarımları incelenmiştir. Sonuçlar, derin öğrenme yöntemlerinin Türkçe EÇ başarımını artırdığını göstermektedir. Kaliteli verilerle eğitilen Türkçe EÇ modelleri daha iyi sonuçlar elde etmiştir. Ayrıca, düşürülmüş zamirlerin etiketlenmesi ve bu ifadeler üzerinde eğitim yapılması, genel EÇ başarımını olumlu etkilemiştir. Çizge sinir ağlarının Türkçe EÇ performansını iyileştireceği hipotezi doğrulanamamıştır. Çok dilli modeller geliştirerek, diller arası transferin Türkçe EÇ başarımına olan olumlu etkileri gösterilmiştir. Türkçe ve benzer dil bilimsel özelliklere sahip dillerin EÇ performanslarında, açıkça belirtilen biçimsel özniteliklerin kullanılmasının olumlu etkisi gözlemlenmiştir. Son olarak, talimat tabanlı eğitimle geliştirilen çok dilli Türkçe EÇ modeli ile büyük dil modellerinin gücünden faydalanarak hem Türkçe hem de çok dilli EÇ performanslarında iyileşme sağlanmıştır.

Gözat

Konu "Artificial intelligence" ile LEE- Bilgisayar Mühendisliği-Doktora'a göz atma

Sayfa başına sonuç

Sıralama Seçenekleri