LEE- Bilgisayar Mühendisliği-Yüksek Lisans

Bu koleksiyon için kalıcı URI

Gözat

Son Başvurular

Şimdi gösteriliyor 1 - 5 / 69
  • Öge
    Monodepth-based object detection and depth sensing for autonomous vehicle vision systems
    (Graduate School, 2025-01-22) Çetin, Emre ; Seçinti, Gökhan ; 504201518 ; Computer Engineering
    This study explores object recognition and distance measurement technologies for autonomous vehicles and addresses advancements in this area. Autonomous vehicles are vehicles capable of perceiving objects in their surroundings and moving safely without human intervention. The progress in these technologies holds the potential to revolutionize mobility, largely driven by the significant role of artificial intelligence in enabling these advancements. Artificial intelligence is considered a fundamental component for autonomous vehicles to perceive environmental conditions and ensure safe travel. Techniques such as deep learning and machine learning allow vehicles to process data collected through cameras, lidars, radars, and other sensors to interpret their surroundings and provide safe responses. The processes of object detection and depth estimation are continually being improved with traditional and AI-based methods.Deep learning models improve object detection and localization capabilities by processing complex image data, thereby enhancing the safety and performance of autonomous vehicles. Autonomous vehicles use various sensors to detect environmental objects and determine distances. These sensors include lidars, radars, cameras, and ultrasonic sensors. Data collected from these sensors are processed to perceive surrounding objects and determine distances from the vehicle. While camera sensors are widely used, distance estimation with single-lens 2D cameras can be challenging. Advancements in artificial intelligence techniques have achieved significant success with 2D camera images. Particularly, these techniques offer alternatives to challenges such as increasing sensor costs and the need for 3D cameras to adapt to various environmental conditions, enhancing the importance of progress in this area. In this study, different artificial neural network models were used for object detection and depth estimation. Optimization efforts were conducted to adapt trained models for real-world applications, and testing on the Nvidia Jetson device was considered a crucial step. The developed application achieved successful results in real-time processing performance, representing a significant advancement in autonomous vehicle technologies. The real-time processing performance of the developed application was determined to be 12 frames per second (FPS). Additionally, the average absolute error value obtained in the object detection and distance estimation areas was measured as 1.51 meters. These values indicate that the application operates quickly and reliably, accurately predicting object locations.
  • Öge
    An empirical investigation on improving fairness testing for machine learning models
    (Graduate School, 2024-02-06) Karakaş, Umutcan ; 504211534 ; Computer Engineering
    The usage of machine learning has become a more common practice in our lives, so the effects of machine learning can be seen in various sectors such as healthcare, finance, entertainment, and commerce. Thus ML models started taking more crucial roles in influencing decisions and molding experiences. However, the power of machine learning doesn't come without challenges, especially with the issues with fairness. Bias in machine learning systems can skew results, leading to potential inaccuracies or injustices. For instance, a recruitment system might, due to historical data biases, favor one demographic over another, inadvertently perpetuating gender or ethnic disparities. Similarly, a healthcare diagnostic tool might provide unreliable results for certain racial groups if the data it's trained on doesn't account for diversity. Such examples of unfair machine learning behaviors show the crucial need for fairness in these systems. Previous approaches for improving the fairness performance of ML models have focused on the detection and correction of a wide range of data points scattered all over the feature space which generally leads to unrealistic or extreme cases. However, this method has flaws, focusing on those extreme data points can result in missing more common fairness issues, which makes the approach less effective. RSFair is a new approach that shifts the focus from unrealistic or extreme cases to more representative and realistic data instances. This technique aims to detect more common unfair behaviors with the idea that understanding and removing bias in common scenarios will solve the majority of fairness problems in return. In the methodology of RSFair, two primary techniques are employed: Orthogonal Matching Pursuit (OMP) and K-Singular Value Decomposition (K-SVD). These methods are used for sampling a representative set of data points out of a large dataset while keeping its essential characteristics. OMP reconstructs the dataset by the selection of the atom from the dictionary which is the most correlated with the goal signal. This dictionary doesn't includes every single element from the original dataset. Instead, it uses a strategic compilation of atoms that, when combined, represents the full scope of the original dataset. This can also be thought of as trying to recreate the original data set with minimum error, while this error will be reduced and optimized in the K-SVD process by updating the dictionary atoms. This process involves a careful and systematic approach, ensuring that the most representative data points are selected for the dictionary. K-SVD, on the other hand, continually refines the dictionary. It does this through an iterative process, where the dictionary is updated after each cycle. Each iteration aims to optimize the dictionary further, reinforcing its accuracy and reliability as a smaller mirror of the larger dataset. In the RSFair method, OMP and K-SVD are not standalone processes but are collaborative and complementary. The initial dictionary creation by OMP is crucial as it establishes a solid foundation. Still, it's the continuous optimization through K-SVD that ensures this foundation remains robust and reflective of the original dataset. In this study, we've focused on two main research questions: RQ1. How effective is RSFair in finding discriminatory inputs? RQ2. How useful are the generated test inputs to improve the fairness of the model? Addressing the first question, we decided to try out OMP and K-SVD, for creating a representative sampling. to use for discriminatory point detection. This facilitated a comprehensive comparison of RSFair's performance relative to the AEQUITAS and random sampling methodologies. As for the second question, we utilized the discriminatory points uncovered during the search phase to improve the fairness of the initial model. This procedure was replicated for AEQUITAS, random sampling, and RSFair, for the comparative analysis of the outcomes. The introduction of RSFair represents a meaningful advancement in efforts to enhance fairness in machine learning outcomes. By turning attention away from the extreme cases and considering common problems, it's possible to achieve a better understanding of how bias influences these systems.
  • Öge
    A memory and meta learning based solution in graph continual learning
    (Graduate School, 2024-06-12) Ünal, Altay ; Ünal, Gözde ; 504201566 ; Computer Engineering
    Deep learning models have proven to perform successfully at different tasks such as classification and regression. Continual learning (CL) aims for a model to learn various tasks sequentially. However, when the models are expected to adapt to incoming tasks without maintaining their performance on previous tasks, they tend to forget the previous tasks. This phenomenon is called catastrophic forgetting and catastrophic forgetting is the main challenge in the CL area. Catastrophic forgetting refers to the scenario where a model tends to forget the previous tasks it had been trained on and adjusts its parameters to perform the task it is actively being trained on. Since it is inefficient to train multiple models to perform multiple tasks, CL aims to train a single model such that it can perform on multiple tasks without losing information during the training process. In addition to catastrophic forgetting, CL also focuses on capacity saturation which is another challenge focusing on the effects of the model architecture on learning. CL is currently an emerging research field topic. However, the CL studies mainly focus on image data and there is much to discover in CL research focusing on graph-structured data or graph continual learning (GCL). The proposed solutions for GCL are mainly adapted from the general CL solutions, therefore, there is much to discover in GCL field. However, since the graph-structured data has different properties compared to image data, the graph properties need to be considered when GCL is studied. In this thesis, we focus on continual learning on graphs. We devise a technique that combines two uniquely important concepts in machine learning, namely "replay buffer" and "meta learning", aiming to exploit the best of two worlds to successfully achieve continual learning on graph structured data. In this method, the model weights are initially computed by using the current task dataset. Next, the dataset of the current task is merged with the stored samples from the earlier tasks, and the model weights are updated using the combined dataset. This aids in preventing the model weights converging to the optimal parameters of the current task and enables the preservation of information from earlier tasks. We choose to adapt our technique to graph data structure and the task of node classification on graphs and introduce our method, MetaCLGraph. Experimental results show that MetaCLGraph shows better performance compared to both baseline CL methods and developed GCL techniques. The experiments were conducted on various graph datasets including Citeseer, Corafull, Arxiv, and Reddit.
  • Öge
    A video dataset of incidents & video-based incident classification
    (Graduate School, 2022) Sesver, Duygu ; Ekenel, Hazım Kemal ; 765019 ; Computer Engineering Programme
    Nowadays, the occurrence of natural disasters, such as fires, earthquakes, and floodings have increased in our world. Detection of incidents and natural disasters became more important when action is needed. Social media is one of the data sources to see natural disasters and incidents thoroughly and immediately. There are a lot of studies to detect incidents in the literature. However, most of them are using still images and text datasets. The number of video-based datasets is limited in the literature. Moreover, existing video-based studies have a limited number of class labels. Motivated by the lack of publicly available video-based incident datasets, a diverse dataset with a high number of classes is collected which we named as Video Dataset of Incident (VIDI). The collected dataset has 43 classes which are exactly the same as the ones in the previously published Incidents Dataset. It includes both positive and negative samples for each class as in the Incidents Dataset. The dataset contains 8.881 videos in total, 4.534 of them are positive samples and 4.347 of them are negative samples. Approximately, there are 100 videos for each positive and negative class. Video duration is ten seconds on average. YouTube is used as the source of the videos and video clips are collected manually. Positive examples of the dataset consist of natural disasters such as landslides, earthquakes, floods, vehicle accidents, such as truck accidents, motorcycle accidents, and car accidents, and the consequences of these events such as burned, damaged, and collapsed. Positive samples may contain multiple labels per video and image. It means video can belong to more than one class category. On the other hand, negative samples do not contain the disaster of that class. Samples in the negative can be instances that the model can easily confuse. For instance, the negative example of "car accident" class can be a normal car driving or a video that includes a "flooded" incident, since it contains "flooded" incident but it does not include a "car accident". It is aimed to ensure diversity in the dataset while collecting videos. Videos from different locations for each positive and negative sample are collected to increase diversity. Another approach to increase diversity is to look for videos in various languages to capture the styles of different cultures and include region-specific events. Six languages are used and they are as follows: Turkish, English, Standard Arabic, French, Spanish, and Simplified Chinese. When these languages are not sufficient, videos are queried in different languages, too. After the dataset is collected, various experiments are performed on it. While performing these experiments, the latest video and image classification architectures are used on both existing image-based and newly created video-based incident datasets. The first part of the study is conducted by using only positive samples and negative samples are included in the second part. One of the motivations was to explore the benefit of using video data instead of images for incident classification. To investigate this need, the Vision Transformer and TimeSformer architectures are trained using only positive samples of both datasets for the classification of the incident. Top-1 and top-5 accuracies are used as evaluation metrics. ViT which is designed for the images is performed on the Incidents Dataset. Then, TimeSformer which is designed for multi-frame data is performed on the collected video-based dataset. Eight frames are sampled from each video in the collected dataset and used for the TimeSformer multi-frame training. Since datasets and architectures are not the same in these experiments, it would not be a fair comparison between the image and the video datasets. So, the TimeSformer architecture is performed also on the Incidents Dataset and the ViT is performed also on the VIDI. Input data is adapted according to the requirements of the architectures. In the video classification architecture, each image from the Incidents Dataset was treated as a single-frame video. In the image classification architecture, the middle frame of the input video is used as an image. Finally, to be able to show the impact of using multiple-frames in the incident classification, the TimeSformer architecture is performed also with a single-frame from a video dataset. The same downsampling method is applied and the middle frame is used for the training. TimeSformer achieved 76.56% accuracy in the multi-frame experiment, while 67.37% accuracy is achieved in the single-frame experiment on the collected dataset. This indicates that using video information when available improves incident classification performance. In the experiments, the performance of the state-of-the-art, ViT and TimeSformer architectures, for incident classification is also evaluated. The used method in the Incidents Dataset paper and state-of-the-art image classification architecture performance are compared. The used approach in the Incidents Dataset paper was using ResNet-18 architecture. ViT and TimeSformer achieved higher accuracies than the ResNet-18 on the image-based Incidents Dataset. While the Resnet-18-based model achieved 77.3% accuracy, ViT and TimeSformer achieved 78.5% and 81.47% top-1 accuracy, respectively. Additionally, the performance of ViT and TimeSformer is compared using both datasets in their single-frame version. TimeSformer achieved 67.37% and ViT achieved 61.78% on the single-frame version of the video-based dataset. Moreover, TimeSformer performed 81.47% and ViT performed 78.5% on the image dataset. TimeSformer is found to be superior to ViT on both datasets. However, the results in the collected dataset are lower than those obtained in the Incidents Dataset. There could be two main reasons for this: (1) the image-based dataset contains more examples for training, so systems can learn better models. (2) The collected dataset contains more difficult examples for classification. The second part of the study includes negative samples. By using both positive and negative samples, binary classification models are trained for all classes. The main idea was to measure the performance that a model could detect whether or not that disaster occurs in a given video. Therefore, 43 separate models are trained. As a result, the best accuracy is achieved by the "landslide", and "dirty contamined" classes. The model got the lowest accuracy in the detection of "blocked" disasters. Finally, one more classification experiment has been run on VIDI. This experiment uses negative samples as the 44th class. For the 44th class, 100 videos that do not include any incidents are selected from the negative samples. By using these classes, 72.18% accuracy is achieved for this experiment. In summary, a highly diverse disaster dataset with many classes is presented in this study. For the classification tasks, the performances of the recent video and image classification architectures on video and image datasets are compared, and binary classification experiments are done for each class.
  • Öge
    Deyim derlemi oluşturmak için oyunlaştırılmış kitle kaynak kullanımı
    (Lisansüstü Eğitim Enstitüsü, 2022) Şentaş, Ali ; Eryiğit, Gülşen ; 737883 ; Bilgisayar Mühendisliği Bilim Dalı
    Deyimleri öğrenmek dil öğrenimindeki en zorlu süreçlerden biri olarak görülmektedir. Bunun temel sebeplerinin başında çoğu zaman anlamın deyimi oluşturan kelimelerin anlamları kullanılarak çıkarılamaması bulunmaktadır. İkinci bir problem ise deyimi oluşturulan kelimelerin genelde yan yana gelmesi fakat bazı durumlarda bu kelimeler arasına başka kelimeler girerek ayrılabilmesidir. Bu sorun makine çevirisi ve bağlılık ayrıştırması gibi doğal dil işleme görevlerinde varlığını hissettirmektedir. Deyimlerin kullanımı makine çevirisinde hatalı sonuçlara sebep olmakla beraber, çeviri yaparken deyim bilgisini kullanan sistemler çok daha başarılı sonuçlar üretmektedir. Kaliteli kullanım örneklerinin eksikliği hem dil öğrenenler için hem de doğal dil işleme makine öğrenmesi sistemlerinin eğitiminde hissedilmekte ve bu da zorluğu hem öğrenciler hem de araştırmacılar için daha da artırmaktadır. Bu tezde deyim kullanımı örnekleri toplamak için çok oyunculu bir oyun geliştirilmiştir. Oyun bir mesajlaşma botu olarak geliştirilmiş olup dil öğrenenler ve deyim tanıma sistemleri üzerinde çalışan araştırmacıların kullanması için bir kaynak oluşturmak amaçlanmıştır. Veri toplamak için anadili hedef dil olan oyuncuların kullanım örnekleri ekleyebileceği ve yine aynı oyuncuların birbirlerinin gönderdiği örnekleri oylayabileceği bir etkileşim sistemi tasarlanmıştır. Oyuncular tüm bunları yaparken aynı zamanda çeşitli oyunlaştırma teşvikleri ile birbirleri arasında rekabet ortamında bulunup oyunu oynarken aynı zamanda deyim derlemi oluşturmuşlardır. Kullanılan oyunlaştırma teşviklerinin etkileri gözlenmiş ve teşvikler oyuncuların geri bildirimine göre şekillendirilmiştir. Literatürde çokça kullanılan kitle işleme yöntemiyle veri etiketleme çalışmalarının aksine alanda bir ilk olarak kitle oluşturma ve kitle oylama yöntemleri deyim derleri oluşturulmak için kullanılmıştır. Etiketleyicilerin elle etiketleme yaptığı ve bazen kitleyi doğrulamak için kullandığı geleneksel veri etiketleme yöntemlerin aksine kitle veri oluşturmak ve kalitesini kontrol etmek için kullanılmış ve bu şekilde derlem oluşturma işlerinin hızlandırılması amaçlanmıştır. Kitlenin davranışları çeşitli oyunlaştırma teşvikleri altında incelenmiş ve sistem bu verilere göre değiştirilmiştir. Oyun dilden bağımsız olarak geliştirilip Türkçe ve İtalyanca dilleri için otuz iki günlük süre boyunca, İngilizce için ise 21 gün boyunca açık tutulmuştur. Oyunun bitimini takiben oluşan derlemler dilbilimciler tarafında incelenmiş ve dil öğreniminde ve sözlüklerde kullanıma yatkın olduğuna kanaat getirilmiştir. Dilbilimcilerin ve kitlenin oylarının birbiri ile hizalı olduğu görülmüş ve bu şekilde kitlenin kaliteli ve kötü örnekleri saptamak için kullanıbileceği sonucuna varılmıştır. Bununla beraber toplanan derlemde aynı zamanda çeşitli deyim tanıma makine öğrenmesi modelleri eğitilip test edilmiş ve başarıları ölçülmüştür. Sonuçlar geliştirilen sistemin derlem toplamada etkili bir araç olarak kullanılabileceğini göstermiştir. Bir kitle kaynak derlem toplama sistemi olduğu halde tasarlanan oyun oyuncular tarafından eğlenceli ve faydalı bulunmuş olup sistemin birçok dilde deyim derlemi oluşturma çalışmalarını hızlandırabileceği ve dil öğrenenler, makine öğrenmesi sistemleri ve sözlükler için deyim kullanım örnekleri kaynağı oluşturmada kullanım potansiyeli olduğu gösterilmiştir.