LEE- Bilgisayar Mühendisliği-Yüksek Lisans
Bu koleksiyon için kalıcı URI
Gözat
Çıkarma tarihi ile LEE- Bilgisayar Mühendisliği-Yüksek Lisans'a göz atma
Sayfa başına sonuç
Sıralama Seçenekleri
-
ÖgeTürkçe zamansal ifadelerin etiketlenmesi ve normalleştirilmesi(Lisansüstü Eğitim Enstitüsü, 2021-07-29) Uzun, Ayşenur ; Tantuğ, Ahmet Cüneyd ; 504161504 ; Bilgisayar Mühendisliği ; Computer EngineeringYapısal olmayan metinden bilgi çıkarma alanında yapılan çalışmalar, doğal dil işleme alanında önemli bir yere sahiptir. Kelime kökü bulma, kelime sözcük türü etiketleme, kelime bağımlılık yapı ağacı çıkarım gibi yapısal çalışmaların yanı sıra, son senelerde bilgi çıkarım alanında yapılan çalışmalar önem kazanmıştır. Metin içerisinde tespit edilen semantik bilginin, yapısal bir forma normalleştirilmesi, bilginin çeşitli doğal dil işleme çalışmalarında etkili biçimde kullanılabilmesi için önem arz etmektedir. Zamansal ifade işaretleme ve normalizasyon çalışması, bilgi çıkarım sistemleri içerisinde önemli bir yere sahiptir. Metin içerisinde geçen olaylar hakkında zaman, süre, sıklık, aralık gibi bilgi taşıyan ifadelere (ör. bugün, iki ay sonra, 19 Temmuz'da, her hafta) zamansal ifadeler denilmektedir. Zamansal ifadelerin tespit edilmesi ve belirtilen standarda göre normalize edilmesi başta İngilizce, İspanyolca, Almanca, Çince, Arapça gibi dillerde yaygın bir araştırma alanıdır. Literatürde, bu diller için birçok zamansal ifade işaretleme ve normalizasyon sistemleri sunulmuş, manuel veya otomatik yöntemler ile zamansal ifadeleri işaretlenmiş veri setleri yayınlanmıştır. Sunulan bu sistemlerin, veri setleri üzerinde değerlendirilmesi için semantik değerlendirme seminerleri düzenlenmiştir. Bilgimiz dahilinde Türkçe literatüründe, bu zamana kadar herhangi bir zamansal ifadeleri işaretlenmiş, yapısal bir veri bankası yayınlanmamıştır. Ayrıca, baştan sona Türkçe zamansal ifade tespit ve normalizasyon görevlerini gerçekleştiren bir sisteme, literatür incelemelerimiz sırasında rastlanmamıştır. Bu tez çalışmasında, Türkçe zamansal ifade çıkarım ve normalizasyon alanında temel bir çalışma sayılabilecek, ilk uçtan uca ve Türkçe biçimbilimsel yapısının da dahil edildiği, kural tabanlı zamansal ifade etiketleme ve normalizasyon sistemi geliştirilmiştir. Sistemin geliştirilmesi ve test aşamasında kullanılmak üzere, 109 haber metninde yer alan zamansal ifadeler manuel yöntemle işaretlenmiştir. Tez kapsamında geliştirilen bu veri seti, gelecek araştırma çalışmalarında kullanılması amacı ile ortak kullanıma açılmıştır. Geliştirlen bu sistem, yayınlanan test veri seti üzerinde çalıştırılmıştır. Sistemin performansı, zamansal ifade etiketleme çalışmalarında kullanılan doğruluk (precision) ve tutarlılık (recall) formülleri kullanılarak ölçülmüştür. Metin içerisinde geçen zamansal ifadeler %89 F1 skoru başarısı ile tespit edilirken, doğru tespit edilen ifadelerin "type" ve "value" niteliklerinin normalizasyonunda sırasıyla %89 ve %88 F1 başarısı elde edilmiştir. Gelecek çalışmalarda, hata analizi ve sistem kısıtlamaları bölümlerinde bahsedilen eksiklikler ve tavsiyler göz önünde bulundurularak, daha yüksek başarımlı Türkçe zamansal ifade işaretleme ve normalizasyon çalışmaları gerçekleştirilebilir.
-
ÖgeOrder dispatching via deep reinforcement learning(Graduate School, 2022) Kavuk, Eray Mert ; Kühn Tosun , Ayşe ; 712817 ; Department of Computer EngineeringIn this thesis, the unique order dispatching problem of Getir, a retail and logistics company, has been studied. Getir serves in many cities and multiple countries, and its service area is expanding day by day. Getir, which serves thousands of customers every day in many different fields, is the pioneer of the market in this field. In this thesis, it has been studied on ultra-fast delivery, which is the first and most known service area of the company, which Getir found and started to apply as a first in the world. The aim of Getir's ultra-fast delivery business model is to deliver orders to its customers within minutes. In this business model, orders are fulfilled from the company's warehouses. It is a very challenging goal to complete order delivery in a very short time. Achieving an ultra-fast delivery goal becomes a real problem due to traffic congestion, high numbers of orders at certain times of the day or on certain days of the week. In addition, due to the Covid-19 pandemic and changing customer habits, people increasingly prefer home delivery and shopping method. For this reason, serious changes can be observed in the expected number of orders on a daily and weekly basis. Previously unknown curfews or other restrictions cause changes in the expected number of orders and their content. Therefore, it is not possible to predict these changes with data analysis and estimation methods. For these reasons, an order dispatching algorithm that can adapt to changing conditions is vital. In the ultra-fast delivery model, the goal is to serve as many customers as possible within the predetermined and promised time. Orders can be placed at any time during the working hours of the warehouses in the customer's service zone. It is decided to accept or reject the incoming order according to the order density of the relevant warehouses in the region and the courier shift plans. In the decision-making algorithm here, we recommend using a deep reinforcement learning algorithm instead of a rule-based structure that does not violate constraints. We suggest that an algorithm should be used that can keep up with the growth rate of Getir, which is a fairly fast growing company, and can adapt to the different characteristics of the regions. Before deep reinforcement learning methods that can be applied for this problem, we describe the related problem of Getir and one of the methods used by the company. We discuss the problems, limitations and shortcomings of the method used. We compare and highlight the differences between the proposed method and the current method. We measure the success of the approaches by comparing the proposed methods and the currently used methods over the actual order data. In the ultra-fast delivery business model, it is aimed to deliver the order to the user within 10-15 minutes.
-
ÖgeTFEEC : Türkçe finansal olay çıkarım derlemi(Lisansüstü Eğitim Enstitüsü, 2022) Kaynak, Kadir Şinas ; Tantuğ, A. Cüneyd ; 740021 ; Bilgisayar Mühendisliği Bilim DalıGelişen dünya ile dijitalleşmeye olan ilgi gitgide artmaktadır. Son zamanlarda çeşitli alanlarda yapılan dijitale dönüşüm çalışmaları da bu trendin önemini vurgulayan işler oldu. Dijitalleşme sayesinde inanılmaz sayılarda metinler üretilmekte ve paylaşılmaktadır. Bu dijital kaynaklardan bilginin çıkarımı az sayıdaki veri için elle yapılabilir olsa da çok büyük miktarlardaki verilerle çalışmak oldukça yoğun ve zaman alıcı olmaktadır. Yaşanan problemleri aşabilmek ve elle yapılan çalışmaları otomatikleştirebilmek için bilgi çıkarma teknikleri geliştirilmeye başlanmıştır. Bilgi çıkarma, genellikle metinlerden doğal dil işleme teknikleri kullanılarak yapılandırılmış bilgilerin otomatik olarak çıkartılmasını hedefler. Bu da bu alandaki insan eforunu ve masrafları azaltarak süreci daha verimli hale getirir. Şu ana kadar yapılan katkılar ve gelecekte sağlayacağı düşünülen faydalardan dolayı bu alanda yapılan çalışmalar popülerlik kazanmıştır. Bilgi çıkarmanın bir türü de olay çıkarımıdır. Olay çıkarma metindeki olayları bulmayı, bulunan olayların tiplerinin tespitinin yapılmasını ve buna karşılık gelen argümanların tanımlanmasını içeren zorlu bir görevdir. Bu sayede elde edilen yapılandırılmış bilgiler, farklı doğal dil işleme görevlerine (bilgi tabanı oluşturma, soru cevaplama, dil anlama gibi) faydalı bilgiler sağlayabilir ve katkıda bulunabilir. Ancak bu alanda sınırlı sayıda literatür çalışması yapılmıştır ve yapılan çalışmalar da genellikle belirli bir alana odaklanır. Küresel çapta yapılan olay çıkarımı araştırmalarının odaklandığı çalışma alanlarından biri de finans ve ekonomidir. Finansal olay çıkarımı hem şirketler hem de yatırımcılar için çok önemlidir. Şirketler ürünleri hakkında hızlı geri dönüş alabilmek, risk analizi yapabilmek ve yeni pazar araştırmaları yürütebilmek için bu olaylardan gelen sinyalleri kullanırlar. Yatırımcılar ise piyasa olup biteni takip edebilmek, trendleri erken yakalayabilmek ve yatırımlarını doğru yere yönlendirip, etkin kararlar verebilmek için yapılan çıkarımlardan faydalanırlar. Ancak son yıllarda iyice artan finans haberleri bunları takip etmeyi ve bu metinleri işlemeyi imkansız hale getirmiştir.
-
ÖgeLong-horizon value gradient methods on Stiefel manifold(Graduate School, 2022) Ok, Tolga ; Üre, Nazım Kemal ; 772336 ; Computer Engineering ProgrammeSequential decision-making algorithms play an essential role in building autonomous and intelligent systems. In this direction, one of the most prominent research fields is Reinforcement Learning (RL). The long-term dependencies between the actions performed by a learning agent and the rewards returned by the environment introduce a challenge in RL algorithms. One of the ways of overcoming this challenge is the introduction of the value function approximation. This allows policy optimization in RL algorithms to rely on immediate state value approximation and simplifies policy learning. In practice, however, we use value approximations that involve future rewards and values from the truncated future trajectories with a decaying weighting scheme, such as in TD($\lambda$), to make a better trade between the bias and variance of the value estimator. Policy Gradients (PG), a prominent approach in model-free paradigm, rely on the correlation between past actions and the future rewards within truncated trajectories to form a gradient estimator of the objective function with respect to the policy parameters. However, as the length of the truncated trajectories increases or the $\lambda$ parameter approaches 1, akin to the use of Monte Carlo (MC) sampling, the variance in the gradient estimator of the PG objective increases drastically. Although the gradient estimator in PG methods has zero bias, the increase in variance leads to sample inefficiency, since the approximated value of an action may contain future rewards within the truncated trajectory that are not related to that action. One of the alternatives to the PG algorithms that do not introduce high variance during policy optimization is the Value Gradient (VG) algorithm which utilizes the functional relation between the past actions and the future state values on a trajectory. This estimation requires a differentiable model function; hence, VG algorithms are known as model-based Reinforcement Learning (MBLR) approaches. Although it is possible to apply the VG objective on simulated sub-trajectories, as is the case for most MBLR approaches, the most effective way approach is to apply the VG objective on observed trajectories via the means of reparameterization. If the observed trajectories are used, VG algorithms avoid the issue of compounding errors that occur when the model function is called iteratively with its previous prediction to simulate future trajectories.
-
ÖgeHisse senedi fiyat tahmininde otokodlayıcı ve graf evrişimli ağının uygulanması(Lisansüstü Eğitim Enstitüsü, 2022) Özbilen, Mahmut Lutfullah ; Yaslan, Yusuf ; 739931 ; Bilgisayar Mühendisliği Bilim DalıFinans alanında makine öğrenmesi ve derin öğrenme modellerinin kullanımı son yıllarda artmıştır. Makine öğrenmesi ve derin öğrenme yöntemleri, içerisinde bulunan lineer olmayan aktivasyon fonksiyonları ile finans verilerindeki lineer olmayan ilişkileri öğrenebilmektedir. Finans alanındaki en popüler problemlerden biri de borsadaki hisse senedi fiyatlarının tahminlenmesidir. Yapay Sinir Ağı (YSA) ve Karar Ağaçları Ormanı gibi klasik makine öğrenmesi yöntemleri özellikle 2000'li yıllarda sıklıkla kullanılmıştır. Derin öğrenme yöntemlerinin diğer alanlardaki başarısının artmasıyla birlikte başta yinelemeli sinir ağı modeli olan Uzun-Kısa Süreli Bellek (UKSB) olmak üzere birçok derin öğrenme yöntemi bu probleme çözüm olarak sunulmuştur. Hisse fiyatı serilerinin görüntülere dönüştürülmesiyle birlikte Otokodlayıcı gibi Evrişimli Sinir Ağı (ESA) tabanlı derin öğrenme yöntemleri de literatürde yer almaktadır. Hisselerin birbirleriyle ilişkilerinin kullanıldığı Graf Evrişimli Ağ (GEA) gibi graf tabanlı yöntemler de son yıllarda birçok çalışmada kullanılmıştır. Tez çalışmasında, gelecek hisse fiyatı tahminlemesinde Otokodlayıcı ve GEA yöntemlerinin ürettiği özniteliklerin UKSB ile birlikte kullanıldığı yöntemler sunulmuştur. Yatırımcıların karar alırken mum grafiklerini incelemesinden yola çıkılarak Otokodlayıcı ile mum grafiği görüntülerinden öznitelikler üretilmiştir. Bunun için hisse fiyat bilgilerinden mum grafiği görüntüleri üretilmiştir. Otokodlayıcı modeli girdi olarak bu görüntüleri almakta, kodlayıcı ağı ile bir öznitelik vektörü üretmekte ve kod çözücü ağıyla da vektörü kullanarak mum grafiğini tekrar oluşturmaktadır. Otokodlayıcının girdi olarak alınan mum grafiklerini tekrar oluşturabilmesi, kodlayıcı kısmının ürettiği öznitelik vektörün mum grafiğinden anlamlı bilgiler çıkardığını göstermektedir. Tez çalışmasında da bu üretilen öznitelik vektörü, UKSB ile birlikte iki farklı yöntemde fiyat tahminlemede kullanılmıştır. Otokodlayıcı+UKSB 1 adı verilen ilk yöntemde kodlayıcı tarafından üretilen öznitelik vektörü son 20 günlük fiyat serisine eklenmiş ve UKSB ağına girdi olarak verilmiştir. UKSB ile bu zaman serisi üzerinden fiyat tahmini yapılmıştır. Otokodlayıcı+UKSB 2 adı verilen ikinci yöntemde UKSB girdi olarak sadece 20 günlük fiyat verisini almıştır. UKSB'nin ürettiği öznitelik vektörüyle kodlayıcının ürettiği öznitelik vektörü birleştirilmiş ve YSA'ya ertesi günün fiyatının tahminlemesi için verilmiştir. Borsadaki firmaların gelecekteki fiyatlarının ilişkili olduğu diğer firmalarla bağlantılı olduğundan hareketle GEA ile öznitelik çıkarımı yapılmıştır. Düğümleri hisse senetleri ve kenarları firmaların ilişkileri olan bir graf oluşturulmuştur. Düğümlerin özniteliği olarak her firmanın geçmiş fiyatlarından elde edilen teknik göstergeler kullanılmıştır. Düğümler arasındaki kenarlar da geçmiş hisse senedi fiyatlarının korelasyonu ile kurulmuştur. Graf evrişimli ağlar ile borsadaki her firma için öznitelikler oluşturulmuştur. Bu özniteliklerde hisse senetlerinin ilişkilerinin olduğu diğer hisse senetlerinden de bilgiler bulunmaktadır. GEA ile borsa grafından elde edilen özniteliklerin kullanıldığı iki fiyat tahminleme yöntemi önerilmiştir. GEA+UKSB 1 yönteminde Otokodlayıcı+UKSB 1 yöntemindeki gibi GEA tarafından üretilen öznitelik vektörü 20 günlük fiyat serisine eklenmiş ve UKSB'ye beslenmiştir. GEA+UKSB 2 yönteminde de üretilen öznitelik vektörü UKSB'nin sadece geçmiş fiyatları kullanarak ürettiği öznitelikle birleştirilip YSA'ya girdi olarak verilmiş ve hisse senedi fiyatı tahmin edilmiştir. Tez çalışması kapsamında önerilen son yöntemde Otokodlayıcı, GEA ve UKSB tarafından üretilen öznitelikler birleştirilmiştir. Birleştirilen zengin öznitelik vektörü YSA'ya beslenmiş ve fiyat tahmini yapılmıştır. Bu yönteme Otokodlayıcı+GEA+UKSB adı verilmiştir. Önerilen yöntemlerin performansları Standard & Poor 500 (S&P 500) borsasındaki veriler üzerinde test edilmiştir. Yöntemleri karşılaştırmak için sadece 20 günlük fiyat verileriyle eğitilmiş UKSB modeli kullanılmıştır.
-
ÖgeUGQE : uncertainty guided query expansion in image retrieval(Graduate School, 2022) Öncel, Fırat ; Önal, Gözde ; 767455 ; Computer Engineering ProgrammeImage Retrieval is one of the important subproblems in Computer Vision domain. A typical image retrieval pipeline consists of a feature extractor and a search operation in image database with a given similarity measure. With the dominance of deep learning, hand-crafted feature extraction techniques are replaced with Convolutional Neural Network (CNN) based feature extractors. Images are represented with those extracted features. Sometimes, when a query is made in an image database, some of the retrieved images may be irrelevant to the query image. Those images should be eliminated in order to improve the performance of the image retrieval systems. Query expansion is one of the ways to perform that operation. Query expansion can be considered as making a second search after the retrieved images of the first search are aggregated with the query image. The aggregation can be done in several ways such as taking an average or a weighted average. However, classical query expansion techniques have some drawbacks such as indistinctness between relevant and irrelevant neighbors or monotonic weight assignments. Existing approaches in query expansion did not consider reliability of neighbors in selecting and executing the expansion operation. Reliability per se is not straightforward to measure, however, it can be estimated as inversely proportional to the amount of uncertainty inherent in the neighbor selection. With the advent of neural network based function approximators, an uncertainty quantification can be integrated into standard neural networks that adds an ability of saying "I do not know" or "I am not certain" about this outcome. In this thesis we integrate a pair-wise uncertainty quantification into the query expansion process in order to generate new features via a novel Uncertainty Guided Transformer Encoders (UGTE) method. Those newly generated features are concatenated with original features to enrich the overall feature representations. Then those feature representations are fed into the Learnable Attention Based Transformer Encoders (LABTE) to assign weights to neighbors. Our method consists of UGTE and LABTE: first we generate new features with UGTE, then assign new weights to the neighbors with LABTE. Experimental results show that our proposed method increases the performance of the system relative to the baseline method which consists of only the LABTE framework, over standard image retrieval benchmarks. We utilize a CNN feature extractor, which is trained on Google Landmarks dataset. To extract the features of the transformer encoder, the train dataset that is utilized is rSfM120k, while the method is tested with datasets: rOxford5k, rParis6k and 1 M Distractors.
-
ÖgeAbstract meaning representation of Turkish(Graduate School, 2022) Oral, Kadriye Elif ; Eryiğit, Gülşen ; 774853 ; Computer Engineering ProgrammeIn this thesis, we focus on the Abstract Meaning Representation (AMR) for Turkish. The AMR is a sentence-level representation that summarises all semantic aspects of sentences. Its goal is to create representations that abstract from syntactic features. This is an attempt to group sentences with the same meaning in a semantic representation, regardless of the syntactic features of the sentences. It is also easily readable by humans, which is very convenient for researchers who want to conduct research in this area. AMR is designed for the English language, but can be adapted to adapt to other languages by taking into account language-specific issues. To accomplish this task, it is mandatory to create an AMR guideline that defines language-specific annotation rules. In this thesis, we present Turkish AMR representations by creating an AMR annotation guideline for Turkish. Turkish is a morphologically rich, pro-drop and agglutinative language, which causes it to deviate from English AMR in its representations. In creating the Turkish guideline, we meticulously examine Turkish phenomena and propose solutions to define AMR representations for these deviant points. Besides, we present the first AMR corpus for Turkish that contains 700 AMR annotated sentences. Unfortunately, the creation of such resources is not an easy task, as it requires linguistic training and a large amount of time, and also requires a systematic annotation strategy. We adapt the model-annotate-model-annotate strategy to our annotation task, i.e., instead of dealing with all phenomena at once, we follow a stepwise path. First, we follow a data-driven approach and handle Turkish specific structures that are present in the data. In the second iteration, we use knowledge bases such as Turkish dictionaries and grammar books to cover all linguistic phenomena. This strategy allows us to build a corpus simultaneously. Instead of annotating the sentences from scratch, we use a semi-automatic annotation approach where a parser first processes the sentences and outputs the AMR graphs, which are then corrected/re-annotated by annotators (two native speakers). We implement a rule-based parser by inspiring the methods used in the literature. Our rule-based parser is very similar to the transition parsers, but its actions are driven by the rule list rather than an oracle. We design this parser in this way because our goal is to develop an unsupervised parser that utilizes the available sources. We evaluate our proposed solutions and the rule-based parser using the semantic match score (Smatch). This score shows the quality of our corpus and the accuracy of our parser. The inter-annotated agreement between our annotators is 0.89 Smatch score, the rule-based parser achieves a Smatch score of 0.60, which is a strong baseline for the Turkish AMR parsing task. The final part of this paper deals with the development of a data-driven AMR parser. We formalize our parser as two steps containing a pipeline of multiple classifiers, each with different functionality. The first step of the data-driven parser is to identify concepts to be used in the AMR graphs. Nine separate classifiers are trained for this task.
-
ÖgeDeyim derlemi oluşturmak için oyunlaştırılmış kitle kaynak kullanımı(Lisansüstü Eğitim Enstitüsü, 2022) Şentaş, Ali ; Eryiğit, Gülşen ; 737883 ; Bilgisayar Mühendisliği Bilim DalıDeyimleri öğrenmek dil öğrenimindeki en zorlu süreçlerden biri olarak görülmektedir. Bunun temel sebeplerinin başında çoğu zaman anlamın deyimi oluşturan kelimelerin anlamları kullanılarak çıkarılamaması bulunmaktadır. İkinci bir problem ise deyimi oluşturulan kelimelerin genelde yan yana gelmesi fakat bazı durumlarda bu kelimeler arasına başka kelimeler girerek ayrılabilmesidir. Bu sorun makine çevirisi ve bağlılık ayrıştırması gibi doğal dil işleme görevlerinde varlığını hissettirmektedir. Deyimlerin kullanımı makine çevirisinde hatalı sonuçlara sebep olmakla beraber, çeviri yaparken deyim bilgisini kullanan sistemler çok daha başarılı sonuçlar üretmektedir. Kaliteli kullanım örneklerinin eksikliği hem dil öğrenenler için hem de doğal dil işleme makine öğrenmesi sistemlerinin eğitiminde hissedilmekte ve bu da zorluğu hem öğrenciler hem de araştırmacılar için daha da artırmaktadır. Bu tezde deyim kullanımı örnekleri toplamak için çok oyunculu bir oyun geliştirilmiştir. Oyun bir mesajlaşma botu olarak geliştirilmiş olup dil öğrenenler ve deyim tanıma sistemleri üzerinde çalışan araştırmacıların kullanması için bir kaynak oluşturmak amaçlanmıştır. Veri toplamak için anadili hedef dil olan oyuncuların kullanım örnekleri ekleyebileceği ve yine aynı oyuncuların birbirlerinin gönderdiği örnekleri oylayabileceği bir etkileşim sistemi tasarlanmıştır. Oyuncular tüm bunları yaparken aynı zamanda çeşitli oyunlaştırma teşvikleri ile birbirleri arasında rekabet ortamında bulunup oyunu oynarken aynı zamanda deyim derlemi oluşturmuşlardır. Kullanılan oyunlaştırma teşviklerinin etkileri gözlenmiş ve teşvikler oyuncuların geri bildirimine göre şekillendirilmiştir. Literatürde çokça kullanılan kitle işleme yöntemiyle veri etiketleme çalışmalarının aksine alanda bir ilk olarak kitle oluşturma ve kitle oylama yöntemleri deyim derleri oluşturulmak için kullanılmıştır. Etiketleyicilerin elle etiketleme yaptığı ve bazen kitleyi doğrulamak için kullandığı geleneksel veri etiketleme yöntemlerin aksine kitle veri oluşturmak ve kalitesini kontrol etmek için kullanılmış ve bu şekilde derlem oluşturma işlerinin hızlandırılması amaçlanmıştır. Kitlenin davranışları çeşitli oyunlaştırma teşvikleri altında incelenmiş ve sistem bu verilere göre değiştirilmiştir. Oyun dilden bağımsız olarak geliştirilip Türkçe ve İtalyanca dilleri için otuz iki günlük süre boyunca, İngilizce için ise 21 gün boyunca açık tutulmuştur. Oyunun bitimini takiben oluşan derlemler dilbilimciler tarafında incelenmiş ve dil öğreniminde ve sözlüklerde kullanıma yatkın olduğuna kanaat getirilmiştir. Dilbilimcilerin ve kitlenin oylarının birbiri ile hizalı olduğu görülmüş ve bu şekilde kitlenin kaliteli ve kötü örnekleri saptamak için kullanıbileceği sonucuna varılmıştır. Bununla beraber toplanan derlemde aynı zamanda çeşitli deyim tanıma makine öğrenmesi modelleri eğitilip test edilmiş ve başarıları ölçülmüştür. Sonuçlar geliştirilen sistemin derlem toplamada etkili bir araç olarak kullanılabileceğini göstermiştir. Bir kitle kaynak derlem toplama sistemi olduğu halde tasarlanan oyun oyuncular tarafından eğlenceli ve faydalı bulunmuş olup sistemin birçok dilde deyim derlemi oluşturma çalışmalarını hızlandırabileceği ve dil öğrenenler, makine öğrenmesi sistemleri ve sözlükler için deyim kullanım örnekleri kaynağı oluşturmada kullanım potansiyeli olduğu gösterilmiştir.
-
ÖgeA video dataset of incidents & video-based incident classification(Graduate School, 2022) Sesver, Duygu ; Ekenel, Hazım Kemal ; 765019 ; Computer Engineering ProgrammeNowadays, the occurrence of natural disasters, such as fires, earthquakes, and floodings have increased in our world. Detection of incidents and natural disasters became more important when action is needed. Social media is one of the data sources to see natural disasters and incidents thoroughly and immediately. There are a lot of studies to detect incidents in the literature. However, most of them are using still images and text datasets. The number of video-based datasets is limited in the literature. Moreover, existing video-based studies have a limited number of class labels. Motivated by the lack of publicly available video-based incident datasets, a diverse dataset with a high number of classes is collected which we named as Video Dataset of Incident (VIDI). The collected dataset has 43 classes which are exactly the same as the ones in the previously published Incidents Dataset. It includes both positive and negative samples for each class as in the Incidents Dataset. The dataset contains 8.881 videos in total, 4.534 of them are positive samples and 4.347 of them are negative samples. Approximately, there are 100 videos for each positive and negative class. Video duration is ten seconds on average. YouTube is used as the source of the videos and video clips are collected manually. Positive examples of the dataset consist of natural disasters such as landslides, earthquakes, floods, vehicle accidents, such as truck accidents, motorcycle accidents, and car accidents, and the consequences of these events such as burned, damaged, and collapsed. Positive samples may contain multiple labels per video and image. It means video can belong to more than one class category. On the other hand, negative samples do not contain the disaster of that class. Samples in the negative can be instances that the model can easily confuse. For instance, the negative example of "car accident" class can be a normal car driving or a video that includes a "flooded" incident, since it contains "flooded" incident but it does not include a "car accident". It is aimed to ensure diversity in the dataset while collecting videos. Videos from different locations for each positive and negative sample are collected to increase diversity. Another approach to increase diversity is to look for videos in various languages to capture the styles of different cultures and include region-specific events. Six languages are used and they are as follows: Turkish, English, Standard Arabic, French, Spanish, and Simplified Chinese. When these languages are not sufficient, videos are queried in different languages, too. After the dataset is collected, various experiments are performed on it. While performing these experiments, the latest video and image classification architectures are used on both existing image-based and newly created video-based incident datasets. The first part of the study is conducted by using only positive samples and negative samples are included in the second part. One of the motivations was to explore the benefit of using video data instead of images for incident classification. To investigate this need, the Vision Transformer and TimeSformer architectures are trained using only positive samples of both datasets for the classification of the incident. Top-1 and top-5 accuracies are used as evaluation metrics. ViT which is designed for the images is performed on the Incidents Dataset. Then, TimeSformer which is designed for multi-frame data is performed on the collected video-based dataset. Eight frames are sampled from each video in the collected dataset and used for the TimeSformer multi-frame training. Since datasets and architectures are not the same in these experiments, it would not be a fair comparison between the image and the video datasets. So, the TimeSformer architecture is performed also on the Incidents Dataset and the ViT is performed also on the VIDI. Input data is adapted according to the requirements of the architectures. In the video classification architecture, each image from the Incidents Dataset was treated as a single-frame video. In the image classification architecture, the middle frame of the input video is used as an image. Finally, to be able to show the impact of using multiple-frames in the incident classification, the TimeSformer architecture is performed also with a single-frame from a video dataset. The same downsampling method is applied and the middle frame is used for the training. TimeSformer achieved 76.56% accuracy in the multi-frame experiment, while 67.37% accuracy is achieved in the single-frame experiment on the collected dataset. This indicates that using video information when available improves incident classification performance. In the experiments, the performance of the state-of-the-art, ViT and TimeSformer architectures, for incident classification is also evaluated. The used method in the Incidents Dataset paper and state-of-the-art image classification architecture performance are compared. The used approach in the Incidents Dataset paper was using ResNet-18 architecture. ViT and TimeSformer achieved higher accuracies than the ResNet-18 on the image-based Incidents Dataset. While the Resnet-18-based model achieved 77.3% accuracy, ViT and TimeSformer achieved 78.5% and 81.47% top-1 accuracy, respectively. Additionally, the performance of ViT and TimeSformer is compared using both datasets in their single-frame version. TimeSformer achieved 67.37% and ViT achieved 61.78% on the single-frame version of the video-based dataset. Moreover, TimeSformer performed 81.47% and ViT performed 78.5% on the image dataset. TimeSformer is found to be superior to ViT on both datasets. However, the results in the collected dataset are lower than those obtained in the Incidents Dataset. There could be two main reasons for this: (1) the image-based dataset contains more examples for training, so systems can learn better models. (2) The collected dataset contains more difficult examples for classification. The second part of the study includes negative samples. By using both positive and negative samples, binary classification models are trained for all classes. The main idea was to measure the performance that a model could detect whether or not that disaster occurs in a given video. Therefore, 43 separate models are trained. As a result, the best accuracy is achieved by the "landslide", and "dirty contamined" classes. The model got the lowest accuracy in the detection of "blocked" disasters. Finally, one more classification experiment has been run on VIDI. This experiment uses negative samples as the 44th class. For the 44th class, 100 videos that do not include any incidents are selected from the negative samples. By using these classes, 72.18% accuracy is achieved for this experiment. In summary, a highly diverse disaster dataset with many classes is presented in this study. For the classification tasks, the performances of the recent video and image classification architectures on video and image datasets are compared, and binary classification experiments are done for each class.
-
ÖgeA condition coverage-based black hole inspired meta-heuristic for test data generation(Graduate School, 2022) Ulutaş, Derya Yeliz ; Kühn Tosun, Ayşe ; 504181580 ; Computer Engineering ProgrammeAs software becomes more complex, the importance of software testing increases by the day. It is very important to get as close to bug-free software as possible, especially for safety-critical systems. Some standards, such as DO-178, have been established to ensure that the safety requirements of safety-critical software are met. In these standards, the code coverage ratio is one of the parameters to measure test quality. To increase the coverage rate, test data must be generated in a systematic manner. Combinatorial Testing is one of the most commonly used methods to deal with this problem. However, for software that takes a large number of input parameters, CT causes test case explosion problems and it is not possible to test all produced test cases. To reduce the number of test cases, T-way testing is a technique for selecting a subset of huge numbers of test cases. However, because this technique does not choose test cases based on code coverage, the software is not tested with coverage values at the required height. The motivation of this study is to produce test cases by considering the condition coverage value. Thus, the condition coverage value and test quality will be increased without the need to test too many test cases of all possible combinations. This thesis focuses on the research question (RQ): How can we conduct a meta-heuristic method in Search-Based Combinatorial Testing (SBCT) that generates test data to achieve high coverage rates while avoiding local minima? To find answers to our research question, we reviewed the literature and discovered that Search-based Software Testing and Search (SBST) and Search-based Combinatorial Testing (SBCT) techniques are used to generate optimum test data using meta-heuristic approaches. Among the studies in the literature, we chose studies that work on problems similar to ours and studies that differ in terms of fitness function and data set, so we aimed to examine studies with various features as much as possible. As the results of our literature review, we discovered that meta-heuristics in Search-Based Combinatorial Testing (SBCT) could be used to generate test data for enhanced condition coverage in software testing. During our literature review, we realized that the Black Hole Algorithm (BHA), which is also a meta-heuristic approach, can be also used for our problem. Hence, after analysing alternative solutions, we decided to work on BHA for the following three main characteristics of our inspired study: (1) That study focused on a problem that is very similar to our problem, (2) Although BHA is a data clustering method, it is a novel method used in test data generation, (3) BHA was reported to be more efficient than another optimization algorithm (i.e., Particle Swarm Optimization (PSO)); and this finding inspired us to propose a stronger method. To achieve our goal, we present a novel approach based on a binary variation of the Black Hole Algorithm (BBH) in SBCT and adapt it to the CT difficulties. We reused some of the techniques in BBH, modified some of them, and introduced new methods as well. The proposed BBH version, BH-AllStar, aims the following: (1) obtaining higher condition coverage, (2) avoiding local minima, and (3) handling discrete input values. The main two significant differences between BH-AllStar and BBH are the new elimination mechanism and avoiding local minima by reassessing earlier removed stars and selecting the useful ones to add to the final population. Our contributions are as follows: (1) we perform more detailed tests by using detailed condition coverage rate criteria per each condition branch while generating test cases, (2) we develop a condition coverage-based test cases selecting mechanism and reduce the risk of eliminating the beneficial test cases wrongly, (3) we avoid local minima by providing variety achieved by re-evaluating the test cases which are destroyed before and giving chance them to be added to the final test case pool (4) we give higher priority to coverage rate than test case number unlike existing studies in the literature and thus, we provide more efficient tests with higher condition coverage rates and (5) we provide the validity of our BH-AllStar method by applying it to three different Software Under Test (SUT): one of them is a real-life safety-critical software and two of them are toy examples. Our new metaheuristic method can be applied on different software settings and all types of SUTs can be used on the experiment setup. We analyzed our approach in terms of condition coverage, number of test cases, and execution time. As a result, we observed that our BH-AllStar method provided up to 43% more coverage than BBH. Although BH-AllStar produced more test cases than BBH, this level of increase was acceptable to achieve higher coverage. Finally, we answered RQ by determining that the Black Hole phenomenon, which is provided as a meta-heuristic in SBCT, is a suitable strategy for producing test data in order to reach larger condition ratios while avoiding local minima by modifying and proposing novel features to it. As future work, BH-AllStar can be tested on different SUT, the randomization operation in initialization processes can be optimized and MC/DC tests can be studied. BH-AllStar.
-
Öge3D face animation generation from audio using convolutional neural networks(Graduate School, 2022) Ünlü, Türker ; Sarıel, Sanem ; 504171557 ; Computer Engineering ProgrammeProblem of generating facial animations is an important phase of creating an artificial character in video games, animated movies, or virtual reality applications. This is mostly done manually by 3D artists, matching face model movements for each speech of the character. Recent advancements in deep learning methods have made automated facial animation possible, and this research field has gained some attention. There are two main variants of the automated facial animation problem: generating animation in 2D or in 3D space. The systems that work on the former problem work on images, either generating them from scratch or modifying the existing image to make it compatible with the given audio input. The second type of systems works on 3D face models. These 3D models can be directly represented by a set of points or parameterized versions of these points in the 3D space. In this study, 3D facial animation is targeted. One of the main goals of this study is to develop a method that can generate 3D facial animation from speech only, without requiring manual intervention from a 3D artist. In the developed method, a 3D face model is represented by Facial Action Coding System (FACS) parameters, called action units. Action units are movements of one or more muscles on the face. By using a single action unit or a combination of different action units, most of the facial expressions can be presented. For this study, a dataset of 37 minutes of recording is created. This dataset consists of speech recordings, and corresponding FACS parameters for each timestep. An artificial neural network (ANN) architecture is used to predict FACS parameters from the input speech signal. This architecture includes convolutional layers and transformer layers. The outputs of the proposed solution are evaluated on a user study by showing the results of different recordings. It has been seen that the system is able to generate animations that can be used in video games and virtual reality applications even for novel speakers it is not trained for. Furthermore, it is very easy to generate facial animations after the system is trained. But an important drawback of the system is that the generated facial animations may lack accuracy in the mouth/lip movement that is required for the input speech.
-
ÖgeMeasuring and predicting software requirements volatility for large-scale safety-critical avionics projects(Graduate School, 2022-02-01) Holat, Anıl ; Tosun Kühn, Ayşe ; 504171560 ; Computer EngineeringDuring the software development life cycle, software requirements are subjected to many changes despite the recent developments in software engineering. These modifications, additions, or removals are referred to as requirements volatility. Constantly changing requirements affect cost of the project, the project schedule and the quality of the product. In the worst case projects fail or partially completed due to requirements volatility. Various requirement volatility measures have been used in previous requirement volatility prediction studies and industrial volatility measurement practices. A very big safety-critical avionics software project with thousands of software requirements from ASELSAN company is employed to forecast the number of changes for each software requirement as requirements volatility in this thesis. To explain requirements volatility, we use a complete collection of the following metrics: requirement textual metrics, project-specific characteristics, and interdependencies between software requirements. Requirement textual metrics in this thesis are chosen from two requirements quality analyzer tools that are used in the literature. Project-specific metrics are created by focusing on safety-critical avionics project features one by one and including the ones that would give information on requirements volatility. Traceability links between system and software requirements are used to create a network graph, and network centrality metrics are created for software requirements with regard to this graph. Requirement volatility prediction is done by employing several machine learning techniques which are utilized by base studies: k-nearest neighbor regression algorithm, linear regression, random forest regression and support vector regression. Combining input metric groups with machine learning algorithms, 28 predictive models are created in this study. This research evaluates the performance of proposed models in predicting software requirement change proneness, outperforming input metric combinations, outperforming machine learning techniques, and the success of proposed models in labeling highly volatile software requirements. The model that combines requirement textual measurements, avionics project features, and network centrality metrics with a k-nearest neighbor machine learner produces the best prediction results (MMRE=0.366). Furthermore, the best predictive model properly labels 63.2 percent of highly volatile software requirements that are subject to 80 percent of total software requirement changes. The findings of our research are positive in terms of developing automated requirement change analyzer tools to minimize requirement volatility concerns in early development phases.
-
ÖgeAn online network intrusion detection system for DDoS attacks with IoT botnet(Graduate School, 2022-05-23) Aydın, Erim ; Bahtiyar, Şerif ; 504181513 ; Computer EngineeringThe necessity for reliable and rapid intrusion detection systems to identify distributed denial-of-service (DDoS) attacks using IoT botnets has become more evident as the IoT environment expands. Many network intrusion detection systems (NIDS) built on deep learning algorithms that provide accurate detection have been designed to address this demand. However, since most of the developed NIDSs depend on network traffic flow features rather than incoming packet features, they may be incapable of providing an online solution. On the other hand, online and real-time systems either do not utilize the temporal characteristics of network traffic at all, or employ recurrent deep learning models (RNN, LSTM, etc.) to remember time-based characteristics of the traffic in the short-term. This thesis presents a network intrusion detection system built on the CNN algorithm that can work online and makes use of both the spatial and temporal characteristics of the network data. By adding two memories to the system, with one of them, the system can keep track of the characteristics of previous traffic data for a longer period, and with the second memory, by keeping the previously classified traffic flow information, it can avoid examining all of the packets with the time-consuming deep learning model, reducing intrusion detection time. It has been seen that the suggested system is capable of detecting malicious traffic coming from IoT botnets in a timely and accurate manner.
-
ÖgeGAN-based intrinsic exploration for sample efficient reinforcement learnin(Graduate School, 2022-05-23) Kamar, Doğay ; Ünal, Gözde ; 504181511 ; Computer EngineeringReinforcement learning is a sub-area of artificial intelligence in which the learner learns in a trial-and-error manner. The learner does so by executing an action depending on the current state it is in and observing the results. After executing an action, a reward signal is given to the learner and through the rewards, the learner can learn which actions are best in different situations. However, the learner is not given any prior information about the environment it is in or which action is the best depending on the current state. Therefore, exploring the environment is important for gathering the necessary information in order to navigate to the high rewards. Most common exploration strategies involve random action selection occasionally. However, they only work under some conditions such that the rewards need to be dense and well-defined. These conditions are hard to meet for many real-world problems and an efficient exploration strategy is needed for such problems. Utilizing the Generative Adversarial Networks (GAN), this thesis proposes a novel module for sample efficient exploration, called GAN-based Intrinsic Reward Module (GIRM). The GIRM computes an intrinsic reward for the states and the aim is to compute higher rewards for the novel, unexplored states. The GIRM uses GAN to learn the distribution of the states the learner observes and contains an encoder, which maps a query state to the input space of the generator of the GAN. Using the encoder and the generator, the GIRM can detect if a query state is among the distribution of the observed states. If it is, the state is regarded as a visited state, otherwise, it is a novel state to the learner, in which case the intrinsic reward will be higher. As the learner receives higher rewards for such states, it is incentivized to explore the unknown, leading to sample-efficient exploration. The GIRM is evaluated using two settings: a sparse reward and a no-reward environments. It is shown that the GIRM is indeed capable of exploring compared to the base algorithms, which involve random exploration methods, in both of the settings. Compared to the other studies in the field, the GIRM also manages to explore more efficiently in terms of the number of samples. Finally, we identify a few weaknesses of GIRM: the negative impact on the performance when sudden changes to the distribution of the observed states occur, and the exploitation of very large rewards not being avoided.
-
ÖgeUnveiling the wireless network limitations in federated learning(Graduate School, 2022-05-27) Eriş, Mümtaz Cem ; Oktuğ, Sema Fatma ; Kantarcı, Burak ; 504191531 ; Computer EngineeringHuge increase in edge devices over the world with powerful processors inspired many researchers to apply decentralized machine learning techniques so that these edge devices can contribute to train deep neural networks. Among those decentralized machine learning schemes, federated learning has gained tremendous sympathy as it grants privacy to the edge devices as well as diminishing communication costs. This is because federated learning does not need to access raw data nor store it, instead, clients would learn from their raw data locally and produce gradient updates. These gradient updates would be aggregated at the server. The raw data is kept at clients untouched, to a degree that only the trained gradient updates are shared with the parameter server. As a matter of fact, the privacy and security issues are mostly scaled down and the ML models instead of raw data would save communication overhead. Considering these issues, federated learning has emerged from distributed and decentralized learning yet it revolutionizes the training as it aggregates the locally trained ML models by edge devices. A typical federated learning scheme which is investigated in the thesis, includes many number of clients who calculate the gradient of the loss function by applying stochastic gradient descent method and it also consists of an aggregator that collects these gradients in each communication round. In each round, only randomly selected number of clients participate in federated learning with their calculated gradient. The gradient descent is estimated according to the local batch size which is the fraction of client's local raw data. Collected gradients by the server are averaged in the server and the averaged gradient is disseminated to the clients back. It is expected to see the convergence after many communication rounds, as many clients are anticipated to contribute and therefore train the model in the server about the data. Yet, the issues related to the network limitations for the federated learning process are not covered in the literature. In such federated learning applications and simulations, the network is assumed to be stable and the limitations that come with unstable network are overlooked. These simulations are mostly written on Python and the essential network settings are implicitly asserted. Quality of Service (QoS) parameters such as packet drop ratio and delay are not considered, however they stand as key factors for federated learning convergence since they can slow down or even prevent the convergence process. In fact, there are federated learning applications proposed in the literature which are real-time such as cache-based popular content prediction applications. Meaning that these applications are sensitive to packet drops and delays that are caused by the network. Therefore, delay and packet drops in the network must be thoroughly examined in order to make such federated learning applications feasible. To this end, an advanced federated learning simulation is introduced and results are shared in this study. The simulation includes not only clients and server which are producing gradient updates, but also a full network backbone which allows the observation of the QoS parameters in the federated learning process. To be able to achieve this, a network which consists of clients and the server of the federated learning is simulated using reputable NS3 (Network Simulator 3). The network is designed as dumbbell topology which includes 100 clients on the left hand side of the dumbbell and server on the right hand side. This makes the left router to be the bottleneck, thus the background traffic in the network causes packet drops there. Additional node to generate background traffic is placed in the same side with clients so that the packet drops are observed and the intensity of the packet drops can be arranged by a hyperparameter which is called the interarrival time of the packets that are generated via background traffic. Poisson distribution based background traffic is produced in the manner of the interarrival time between packets at the traffic generator node. By applying ns3-ai framework which enables NS3 and python processes to communicate, the network and the federated learning process are run simultaneously so that the observations on QoS can be made. Since millions of devices are expected to be involved in a federated learning application in which the speed of converge is essential and not all of the clients updates may increase the convergence, UDP (User Datagram Protocol) is utilized as transport layer protocol. These gradient updates are fragmented to UDP packets and are sent from clients to servers and servers to clients. Thus, whenever a UDP packet that carries client update is dropped, the whole client update must be discarded. As a result, discarded clients reduce the performance of the federated learning and cause significant drawbacks to the application. Initially, the experiment is validated by running countless simulations with different seed values. Validation is carried out by testing the reproducibility of the same experiments by comparing cross entropy error, accuracy of both server and clients and also packet drop rates. For various interarrival time values ranging from 250 milliseconds to 900 milliseconds many simulation scenarios are designed. The replication method is used to evaluate the results. This means that each scenario with different seeds are run 10 times and the results are presented with \%95 confidence interval. Among those scenarios, three of them are picked and are tagged as heavy, medium and light traffic intensity which correspond to 250, 400, 900 milliseconds interarrival time, respectively. The results are presented by giving maximum error rates, average success rates and per round test accuracies. The most erroneous batch that is detected in aggregated gradient at server is presented by maximum error percentage after each communication round. It shows the worst performing model and it is meaningful to demonstrate the unfavorable consequences of the background traffic to the performance. With heavy intensity traffic, maximum error percentage goes up to \%80 after round 90, whereas maximum error percentage is between \%10 and \%20 with light traffic. This shows the federated learning application's early vulnerability to the background traffic. With the assumption of the network being completely stable, then average success percentage of client update delivery becomes \%100. However, it is not realistic and average success percentage reduces and fluctuates according to the traffic intensity. As the traffic gets intense, less client updates are received by the parameter server for a successful aggregation. Finally, the test accuracy of various intensity traffic configurations are presented. Packet drops because of the bottleneck queue capacity overflow causes tremendous decrease for the test accuracy which is crucial for any federated learning application. For at least 200 communication rounds, the decline in the accuracy is evidently visible when the traffic is intense. More specifically, \%90 accuracy is reached over 120 rounds for high intensity traffic, while it is reached around 60 rounds for light traffic. The intensity of the background traffic becomes highly crucial consideration for potential time-critical federated learning applications. Confidence intervals on test accuracy are presented according to the traffic intensity. The convergence is achieved no matter what the traffic intensity is. Wide intervals can be seen in earlier rounds and it gets slightly wider if the intensity is higher. In addition to these, according to the traffic intensity or interarrival time, the amount of traffic data, the number of packets that are produced by the background traffic generator node, the data delivery rate and monitored interarrival time are presented as well. In the light of these results, an adaptive federated learning is proposed in order to cope with heavy intensity traffic. By using network metrics such as upload rate, transmission and queueing delay, the maximum number of clients that can be fit in a communication round is calculated and set as participation rate. This allows server to receive more client updates and increasing the performance of the federated learning under heavy background traffic
-
ÖgeFight recognition from still images in the wild(Graduate School, 2022-06-22) Aktı, Şeymanur ; Ekenel, Hazım Kemal ; 504191539 ; Computer EngineeringViolence in general is a sensitive subject and can have a negative impact on both the involved people and witnesses. Fighting is one of the most common types of violence which can be defined as an act where individuals intend to harm each other physically. In daily life, these kinds of situations might not be faced too often, however, the violent content on social media is also a big concern for the users. Since violent acts or fights in particular are considered as an anomaly or intriguing for some, people tend to record these scenes and upload them on their social media accounts. Similarly, news agencies also regard them as newsworthy material in some cases. As a result, fighting scenes become available on social media platforms frequently. Some users may be sensitive to these kinds of media content and children who can be harmed due to the aggressive nature of the fight scenes also uses social media. These facts make it necessary to detect and put limitations on the distribution of violent content on social media. There are some systems focusing on violence and fight recognition on visual data. However, these works mostly propose methods on different domains for violence such as movies, surveillance cameras, etc., and the social media case remains unexplored. Furthermore, even if most of the fight scenes shared on social media are in video sequences, there is also a non-ignorable amount of image data depicting violent fighting. However, no work tackles the fight recognition from still images instead of videos. Thus, in this thesis, the problem of fight recognition from still images is investigated. In this scope, first, a novel dataset was collected from social media images which is named Social Media Fight Images (SMFI). The dataset was collected from Twitter and Google images and some frames were included from the video dataset of NTU CCTV-Fights. The fight samples were chosen among the samples which are recorded in uncontrolled environments. In order to crawl a large amount of data, different keywords were used in various languages. The non-fight samples were also chosen among the data crawled from social media in order to keep the domain consistent across the classes. The dataset is made publicly available by sharing the links to the images. For the classification of the Social Media Fight Images dataset, some image classification methods were applied to the dataset. First, Convolutional Neural Networks (CNN) were employed for the task and their performance was assessed. Then, a recent approach, Vision Transformer (ViT) was exploited for the classification of the fight and non-fight images. The comparison showed that the Vision Transformer gives better results on the dataset achieving a higher accuracy with less overfit. A further experiment was also held on investigating the effect of varying dataset sizes on the performance of the model. This was seen as necessary as the data shared on social media may be deleted in the future and it is not always possible to retrieve the whole dataset. So, the model was trained on different partitions of the dataset and the results showed that even if using more data is better, the model could still give satisfying performance even in absence of 60% of the dataset. Upon the successful results on fight recognition on still images problem, another experimental study was conducted on the classification of video-based datasets using a single frame from each sample. The experiment included four video-based fight datasets and results showed that three of them could be successfully classified without using any temporal information. This indicated that there might be a dataset bias for these three datasets where the inter-class visual difference is high across the classes. Cross-dataset experiments also supported this hypothesis where the trained models on these video datasets perform poorly on the other fight recognition datasets. Nonetheless, the network trained on the proposed SMFI dataset gave a promising accuracy on other datasets as well, showing that the dataset generalizes the fight recognition problem better than the others.
-
ÖgeEffect of semi-supervised self-data annotation on video object detection performance(Graduate School, 2022-06-22) Akman, Vefak Murat ; Töreyin, Behçet Uğur ; 704191017 ; Computer SciencesAccess to annotated data is more crucial than ever when deep learning frameworks replace traditional machine learning methodologies. Even if the method is robust, training performance can be inadequate if the data has poor quality. Some methods were developed to address data-related issues. These methods, however, have a negative impact on algorithm complexity and processing cost. Errors related to human factors, such as misclassification or inaccurate labeling, should also be considered. Multiple steps in the data annotation process cost time and money. These steps can be listed as follows. Data gathering, annotation and formatting according to deep learning model architecture. Unfortunately, these steps are still not fully set to a standard and the whole process comes with a lot of difficulties. In this study, the effect of semi-supervised data annotation on video object detection is analysed by using the Soft Teacher algorithm. Soft Teacher is a Swin-Transformer backboned semi-supervised learning method which has a major advantage on overcoming limited data. Swin Transformer is a type of vision transformer. It creates hierarchical feature maps by merging image patches in deeper layers and has linear computation complexity to input image size. As a such, it can be used as a general-purpose backbone for tasks like classification and object detection. In Soft Teacher, there are two types of models; the Student model and the Teacher model. The Teacher model performs pseudo-labeling on weak augmented unlabeled images and the Student model is trained on both labelled and strong augmented unlabeled images while updating the Teacher model. Soft Teacher model was trained with open-source COCO data set that consists of 80 labels. The data set contains 118287 train, 123403 unlabeled and 5000 validation images, was created by the human. The Soft Teacher was trained with percent of 1, 5, 10 and 100 labelled data respectively. Then, using those trained Soft Teacher models, new data was created from the same raw data and some of the state-of-the-art object detection algorithms were trained with newly annotated data. To compare results, these object detection models were also trained with manual annotated data. The model trained with human data was shown to be less successful than the other in terms of mAPs. However, the model that was trained with self annotated data produced more false positives. Because, the trained model can perform mislabeling when generating new data. In conclusion, the results suggest that semi-supervised data annotation degrades the detection performance in expense of huge amounts of training time savings.
-
ÖgePresentation attack detection with shuffled patch-wise binary supervision(Graduate School, 2022-06-22) Kantarcı, Alperen ; Ekenel, Hazım Kemal ; 504191504 ; Computer EngineeringFace recognition systems have been one of the most commonly used biometrics in various different applications, such as mobile payments, smart phone security, and accessing to high-security areas. However, face presentation attacks created by people who obtain biometric data covertly from a person or through hacked systems are among the major threats to face recognition systems. Presentation attacks are easy to make, especially for face presentation attacks, as malicious individuals only need a high-quality face image of any enrolled user to bypass the biometric system. In order to detect these attacks, Convolutional Neural Network (CNN) based systems have gained significant popularity recently. Convolutional Neural Networks provide end-to-end systems for presentation attacks. They also offer fast inference, which is helpful for the biometric systems. However, CNN-based systems need a substantial amount of data to train. It is hard to acquire presentation attacks as for each attack, a human should physically attack to the sensor. Unlike face recognition datasets that utilize millions of face images crawled from the internet, presentation attacks have to be captured explicitly for the dataset. Therefore, publicly available datasets are significantly smaller. As neural networks require massive amount of data to generalize, CNN-based presentation attack detection systems perform very well on intra-dataset experiments. Yet, they fail to generalize to the datasets that they have not been trained on. This indicates that they tend to memorize dataset-specific spoof traces. To mitigate this crucial problem, we propose a new presentation attack detection training approach that combines pixel-wise binary supervision with patch-based Convolutional Neural Networks. We call our method as Deep Patch-wise Supervision Presentation Attack Detection (DPS-PAD). %Our method can be seen as an augmentation method as it only changes how inputs are created. The proposed method combines different patches of different attacks and bona fide images of the dataset and creates a new training data this way. Our experiments show that the proposed patch-based method forces the model not to memorize the background information or dataset-specific traces. We extensively tested the proposed method on widely used presentation attack detection datasets ---Replay-Mobile, OULU-NPU--- and on a real-world dataset that has been collected for real-world presentation attack detection use cases. The proposed approach is found to be superior in challenging experimental setups. Namely, it achieves higher performance on OULU-NPU Protocol 3, 4 and on inter-dataset real-world experiments.
-
ÖgeA variational graph autoencoder for manipulation action recognition and prediction(Graduate School, 2022-06-23) Akyol, Gamze ; Sarıel, Sanem ; Aksoy, Eren Erdal ; 504181561 ; Computer EngineeringDespite decades of research, understanding human manipulation actions has always been one of the most appealing and demanding study problems in computer vision and robotics. Recognition and prediction of observed human manipulation activities have their roots in, for instance, human-robot interaction and robot learning from demonstration applications. The current research trend heavily relies on advanced convolutional neural networks to process the structured Euclidean data, such as RGB camera images. However, in order to process high-dimensional raw input, these networks must be immensely computationally complex. Thus, there is a need for huge amount of time and data for training these networks. Unlike previous research, in the context of this thesis, a deep graph autoencoder is used to simultaneously learn recognition and prediction of manipulation tasks from symbolic scene graphs, rather than using structured Euclidean data. The deep graph autoencoder model which is developed in this thesis needs less amount of time and data for training. The network features a two-branch variational autoencoder structure, one for recognizing the input graph type and the other for predicting future graphs. The proposed network takes as input a set of semantic graphs that represent the spatial relationships between subjects and objects in a scene. The reason of using scene graphs is their flexible structure and modeling capability of the environment. A label set reflecting the detected and predicted class types is produced by the network. Two seperate datasets are used for the experiments, which are MANIAC and MSRC-9. MANIAC dataset consists 8 different manipulation action classes (e.g. pushing, stirring etc.) from 15 different demonstrations. MSRC-9 consists 9 different hand-crafted classes (e.g. cow, bike etc.) for 240 real-world images. The reason for using such two distinct datasets is to measure the generalizability of the proposed network. On these datasets, the proposed new model is compared to various state-of-the-art methods and it is showed that the proposed model can achieve higher performance. The source code is also released https://github.com/gamzeakyol/GNet.
-
ÖgeGeneralized multi-view data proliferator (gem-vip) for boosting classification(Graduate School, 2022-08-08) Çelik, Mustafa ; Rekik, Islem ; 504131531 ; Computer EngineeringMulti-view network representation revealed multi-faced alterations of the brain as a complex interconnected system, particularly in mapping neurological disorders. Such rich data representation maps the relationship between different brain views which has the potential of boosting neurological diagnostic tasks. However, multi-view brain data is scarce and generally is collected in small sizes. Thus, such data type is broadly overlooked among researchers due to its relatively small size. Despite the existence of data proliferation techniques as a way to overcome data scarcity, to the best of our knowledge, multi-view data proliferation from a single sample has not been fully explored. Here, we propose to bridge this gap by proposing our GEneralized Multi-VIew data Proliferator (GEM-VIP), a framework aiming to proliferate synthetic multi-view brain samples from a single multi-view brain to boost multi-view brain data classification tasks. For the given Connectional Brain Template (i.e., represents an approximation of brain graphs that captures the unique connection shared by a population's subjects), we set out the proliferate synthetic multi-view brain graphs using the inverse of multi-variate normal distribution (MVND). However, one needs two crucial components, which are the mean an the covariance of a given population. As such, first, our proposed GEM-VIP framework obtains a population-representative tensor (i.e., drawn from the prior CBT) which can be mathematically regarded as a mean of the population. Second, drawing inspiration from the genetic algorithm paradigm our proposed GEM-VIP learns the covariance matrix of the population using the given CBT. Lastly, it proliferates synthetic samples using the earlier obtained representative tensor and created covariance matrix of the population on the MVND equation. We evaluate our GEM-VIP against several comparison methods. The results show that our framework boosts the multi-view brain data classification accuracy of AD/ lMCI and eMCI/ normal control (NC) datasets. In short, our GEM-VIP method boosts the diagnoses of the neurological disorders.