LEE- Bilgisayar Mühendisliği Lisansüstü Programı

Bu topluluk için Kalıcı Uri

http://hdl.handle.net/11527/19206

Gözat

3D face animation generation from audio using convolutional neural networks

(Graduate School, 2022) Ünlü, Türker ; Sarıel, Sanem ; 504171557 ; Computer Engineering Programme

Problem of generating facial animations is an important phase of creating an artificial character in video games, animated movies, or virtual reality applications. This is mostly done manually by 3D artists, matching face model movements for each speech of the character. Recent advancements in deep learning methods have made automated facial animation possible, and this research field has gained some attention. There are two main variants of the automated facial animation problem: generating animation in 2D or in 3D space. The systems that work on the former problem work on images, either generating them from scratch or modifying the existing image to make it compatible with the given audio input. The second type of systems works on 3D face models. These 3D models can be directly represented by a set of points or parameterized versions of these points in the 3D space. In this study, 3D facial animation is targeted. One of the main goals of this study is to develop a method that can generate 3D facial animation from speech only, without requiring manual intervention from a 3D artist. In the developed method, a 3D face model is represented by Facial Action Coding System (FACS) parameters, called action units. Action units are movements of one or more muscles on the face. By using a single action unit or a combination of different action units, most of the facial expressions can be presented. For this study, a dataset of 37 minutes of recording is created. This dataset consists of speech recordings, and corresponding FACS parameters for each timestep. An artificial neural network (ANN) architecture is used to predict FACS parameters from the input speech signal. This architecture includes convolutional layers and transformer layers. The outputs of the proposed solution are evaluated on a user study by showing the results of different recordings. It has been seen that the system is able to generate animations that can be used in video games and virtual reality applications even for novel speakers it is not trained for. Furthermore, it is very easy to generate facial animations after the system is trained. But an important drawback of the system is that the generated facial animations may lack accuracy in the mouth/lip movement that is required for the input speech.
A composed technical debt identification methodology to predict software vulnerabilities

(Graduate School, 2023-10-27) Halepmollası, Ruşen ; Kühn Tosun, Ayşe ; 504162505 ; Computer Engineering

Software systems must be evolvable and maintainable to meet evolving customer requirements and technology advancements in a rapidly changing IT landscape. Technical debt refers to the accumulated cost as a consequence of rushed design decisions and code implementations and inadequate testing, which compromises long-term software quality for short-term objectives. When technical debt remains invisible and cannot be managed, it accumulates over time and similar to financial debt, it can have interest payments that are the extra effort for future development. The accumulated debt complicates software maintainability and evolvability and potentially leads to security risks. Technical Debt Management is a continuous process, and hence, it is important to integrate this process into the overall software development process. Software security is a quality characteristic and refers to the protection of the systems and networks against vulnerabilities and exploits by building secure software. By integrating security best practices throughout the software development life cycle, the risks associated with security vulnerabilities can be mitigated. To reduce the possibility of a system's vulnerability, incorporating security-oriented thinking into the systems is a better strategy as providing functional and secure development together throughout the overall life cycle will offer protection at all layers of the software. Besides, coding and design flaws are significant contributors to vulnerabilities, highlighting the significance of addressing technical debt as a means to prevent security threats. The main objective of this thesis is to explore relationship between technical debt and software security and provide insights to bridge the gap between technical and business stakeholders. To accomplish this objective, we collected and analyzed real-world data from various projects' GitHub repositories and the National Vulnerability Database. The vulnerability data is linked to corresponding code changes, enabling the identification of vulnerability-inducing commits. Moreover, we prepared an additional dataset of code smells using the PMD tool to investigate the impact of code quality issues on software security. In this thesis, we focus on offering valuable insights into the relationship between technical debt and software security through the collection and analysis of real vulnerability data from open source projects. This analysis provides a deeper understanding of how technical debt impacts software security and the associated risks. First, we investigate the relationship between technical debt indicators such as code smells and code faults and refactoring activities, recognizing the role of refactoring in mitigating technical debt. Therefore, we provide empirical findings that add depth to the understanding of refactoring impact. By analyzing refactoring activities and their impact on technical debt, we aim to identify the extent to which refactoring can enhance or reduce code smells and/or faults. Then, we conduct a comprehensive analysis of technical debt indicators, including software metrics, code smells, and bugs, to predict software security risks. By examining multiple technical debt indicators, we aim to provide a holistic view of the relationship between technical debt and vulnerabilities. This analysis will assist in identifying specific indicators that can reliably predict software security risks, thereby enabling proactive mitigation efforts. We conduct two types of research methods: exploratory research and explanatory research. These methods are utilized to investigate various aspects of software development, each serving a distinct purpose. Both exploratory and explanatory studies play crucial roles in software engineering research. Exploratory studies enable us to explore new or poorly understood phenomena, while explanatory studies allow us to investigate cause-and-effect relationships between variables.
A condition coverage-based black hole inspired meta-heuristic for test data generation

(Graduate School, 2022) Ulutaş, Derya Yeliz ; Kühn Tosun, Ayşe ; 504181580 ; Computer Engineering Programme

As software becomes more complex, the importance of software testing increases by the day. It is very important to get as close to bug-free software as possible, especially for safety-critical systems. Some standards, such as DO-178, have been established to ensure that the safety requirements of safety-critical software are met. In these standards, the code coverage ratio is one of the parameters to measure test quality. To increase the coverage rate, test data must be generated in a systematic manner. Combinatorial Testing is one of the most commonly used methods to deal with this problem. However, for software that takes a large number of input parameters, CT causes test case explosion problems and it is not possible to test all produced test cases. To reduce the number of test cases, T-way testing is a technique for selecting a subset of huge numbers of test cases. However, because this technique does not choose test cases based on code coverage, the software is not tested with coverage values at the required height. The motivation of this study is to produce test cases by considering the condition coverage value. Thus, the condition coverage value and test quality will be increased without the need to test too many test cases of all possible combinations. This thesis focuses on the research question (RQ): How can we conduct a meta-heuristic method in Search-Based Combinatorial Testing (SBCT) that generates test data to achieve high coverage rates while avoiding local minima? To find answers to our research question, we reviewed the literature and discovered that Search-based Software Testing and Search (SBST) and Search-based Combinatorial Testing (SBCT) techniques are used to generate optimum test data using meta-heuristic approaches. Among the studies in the literature, we chose studies that work on problems similar to ours and studies that differ in terms of fitness function and data set, so we aimed to examine studies with various features as much as possible. As the results of our literature review, we discovered that meta-heuristics in Search-Based Combinatorial Testing (SBCT) could be used to generate test data for enhanced condition coverage in software testing. During our literature review, we realized that the Black Hole Algorithm (BHA), which is also a meta-heuristic approach, can be also used for our problem. Hence, after analysing alternative solutions, we decided to work on BHA for the following three main characteristics of our inspired study: (1) That study focused on a problem that is very similar to our problem, (2) Although BHA is a data clustering method, it is a novel method used in test data generation, (3) BHA was reported to be more efficient than another optimization algorithm (i.e., Particle Swarm Optimization (PSO)); and this finding inspired us to propose a stronger method. To achieve our goal, we present a novel approach based on a binary variation of the Black Hole Algorithm (BBH) in SBCT and adapt it to the CT difficulties. We reused some of the techniques in BBH, modified some of them, and introduced new methods as well. The proposed BBH version, BH-AllStar, aims the following: (1) obtaining higher condition coverage, (2) avoiding local minima, and (3) handling discrete input values. The main two significant differences between BH-AllStar and BBH are the new elimination mechanism and avoiding local minima by reassessing earlier removed stars and selecting the useful ones to add to the final population. Our contributions are as follows: (1) we perform more detailed tests by using detailed condition coverage rate criteria per each condition branch while generating test cases, (2) we develop a condition coverage-based test cases selecting mechanism and reduce the risk of eliminating the beneficial test cases wrongly, (3) we avoid local minima by providing variety achieved by re-evaluating the test cases which are destroyed before and giving chance them to be added to the final test case pool (4) we give higher priority to coverage rate than test case number unlike existing studies in the literature and thus, we provide more efficient tests with higher condition coverage rates and (5) we provide the validity of our BH-AllStar method by applying it to three different Software Under Test (SUT): one of them is a real-life safety-critical software and two of them are toy examples. Our new metaheuristic method can be applied on different software settings and all types of SUTs can be used on the experiment setup. We analyzed our approach in terms of condition coverage, number of test cases, and execution time. As a result, we observed that our BH-AllStar method provided up to 43% more coverage than BBH. Although BH-AllStar produced more test cases than BBH, this level of increase was acceptable to achieve higher coverage. Finally, we answered RQ by determining that the Black Hole phenomenon, which is provided as a meta-heuristic in SBCT, is a suitable strategy for producing test data in order to reach larger condition ratios while avoiding local minima by modifying and proposing novel features to it. As future work, BH-AllStar can be tested on different SUT, the randomization operation in initialization processes can be optimized and MC/DC tests can be studied. BH-AllStar.
A graph neural network model with adaptive weights for session-based recommendation systems

(Graduate School, 2024-07-02) Özbay, Begüm ; Öğüdücü Gündüz, Şule ; Tugay, Resul ; 504211508 ; Computer Engineering

The development of artificial intelligence (AI) and machine learning (ML) models has revolutionized various industries, leading to the widespread adoption of predictive models. E-commerce platforms, in particular, have greatly benefited from these advancements, especially through the implementation of recommendation systems. These systems play a crucial role in enhancing user experience and increasing sales by personalizing the shopping journey. By suggesting products tailored to individual user's interests and preferences, recommendation systems not only improve customer satisfaction but also strengthen user loyalty to the platform. Recommendation systems can be categorized into several types. Content-based recommendation systems offer new suggestions based on the characteristics of items that a user has previously liked or interacted with. Collaborative filtering systems, on the other hand, make recommendations based on users' past behaviors and the preferences of similar users. Hybrid systems combine both content-based and collaborative filtering methods to provide more accurate and personalized recommendations. Additionally, there are more specialized recommendation systems tailored to specific needs or domains. Among these, session-based recommendation systems have emerged as particularly effective due to their ability to evaluate users' shopping behaviors and provide timely suggestions. Unlike traditional recommendation systems that rely on long-term user history, session-based systems focus on analyzing users' actions during their current sessions to offer real-time recommendations. By considering users' immediate preferences and dynamically updating algorithms, these systems significantly enhance the quality of recommendations. For instance, if a user explores a specific category or adds an item to their cart, session-based recommendation systems can suggest relevant products based on this activity. This real-time adaptability further personalizes the user experience and encourages users to spend more time on the platform, thereby increasing the likelihood of successful recommendations even with short-term shopping histories. In this study, we propose an innovative approach for session-based recommendation systems by implementing an adaptive weighting mechanism on graph neural network (GNN) vectors. The goal is to enhance the predictive accuracy of recommendation models by applying this weighting mechanism to an existing session-based recommendation model, SR-GNN (Session-based Recommendation with Graph Neural Networks). This mechanism is designed to include various types of contextual information obtained during the session. The traditional SR-GNN model focuses on users' last interactions during sessions and evaluates the relationships between these interactions and other items. However, to increase the model's effectiveness, it is necessary to dynamically determine the importance of each item. The adaptive weighting mechanism evaluates each interaction individually during the session and optimizes its impact on the model. This mechanism assigns different importance levels to each item during the session. These importance levels are dynamically adjusted based on users' immediate preferences and interactions. By implementing the weighting mechanism, the significance of each interaction during the session is taken into account, allowing for a more in-depth analysis of user behaviors. Each action performed by users during the session becomes a crucial source of contextual information for the next recommendation. These contextual details are used to increase the accuracy of recommendation models. By focusing on users' last actions and evaluating the relationships between similar actions, the weighting mechanism strengthens the recommendation system. Experimental evaluations on the Dressipi dataset have demonstrated the effectiveness of the proposed approach in enhancing user experience compared to traditional models. The ability to provide accurate and relevant recommendations in real time is key to improving user satisfaction and increasing sales on e-commerce platforms. The experimental results indicate that the adaptive weighting strategy significantly outperforms the SR-GNN model. Moreover, this strategy is particularly effective in addressing the cold start problem, providing more accurate recommendations for new users and newly added products. Future research aims to explore the scalability of session-based recommendation systems to larger datasets and more complex recommendation scenarios. As the volume of data continues to grow, developing models that can efficiently manage this influx while maintaining high performance becomes increasingly important. Enhancing the scalability and robustness of these models will be critical for their widespread adoption and effectiveness in diverse e-commerce environments. Additionally, ongoing advancements in AI and ML techniques are expected to lead to further improvements in recommendation algorithms, making them even more precise and responsive to users' needs. In conclusion, session-based recommendation systems represent a significant advancement in the field of e-commerce, offering a sophisticated and adaptive approach to product recommendations. By leveraging the power of AI and ML, these systems can analyze user behavior in real-time and provide personalized suggestions that enhance the overall shopping experience. As research continues to advance, the potential for further improvements in these systems is vast, promising even greater benefits for both users and e-commerce platforms. The integration of session context and advanced algorithms will undoubtedly play a pivotal role in shaping the future of recommendation systems, driving user engagement, and increasing sales in the competitive world of e-commerce.
A memory and meta learning based solution in graph continual learning

(Graduate School, 2024-06-12) Ünal, Altay ; Ünal, Gözde ; 504201566 ; Computer Engineering

Deep learning models have proven to perform successfully at different tasks such as classification and regression. Continual learning (CL) aims for a model to learn various tasks sequentially. However, when the models are expected to adapt to incoming tasks without maintaining their performance on previous tasks, they tend to forget the previous tasks. This phenomenon is called catastrophic forgetting and catastrophic forgetting is the main challenge in the CL area. Catastrophic forgetting refers to the scenario where a model tends to forget the previous tasks it had been trained on and adjusts its parameters to perform the task it is actively being trained on. Since it is inefficient to train multiple models to perform multiple tasks, CL aims to train a single model such that it can perform on multiple tasks without losing information during the training process. In addition to catastrophic forgetting, CL also focuses on capacity saturation which is another challenge focusing on the effects of the model architecture on learning. CL is currently an emerging research field topic. However, the CL studies mainly focus on image data and there is much to discover in CL research focusing on graph-structured data or graph continual learning (GCL). The proposed solutions for GCL are mainly adapted from the general CL solutions, therefore, there is much to discover in GCL field. However, since the graph-structured data has different properties compared to image data, the graph properties need to be considered when GCL is studied. In this thesis, we focus on continual learning on graphs. We devise a technique that combines two uniquely important concepts in machine learning, namely "replay buffer" and "meta learning", aiming to exploit the best of two worlds to successfully achieve continual learning on graph structured data. In this method, the model weights are initially computed by using the current task dataset. Next, the dataset of the current task is merged with the stored samples from the earlier tasks, and the model weights are updated using the combined dataset. This aids in preventing the model weights converging to the optimal parameters of the current task and enables the preservation of information from earlier tasks. We choose to adapt our technique to graph data structure and the task of node classification on graphs and introduce our method, MetaCLGraph. Experimental results show that MetaCLGraph shows better performance compared to both baseline CL methods and developed GCL techniques. The experiments were conducted on various graph datasets including Citeseer, Corafull, Arxiv, and Reddit.
A new key performance indicator design for academic publishing

(Graduate School, 2022-06-16) Hamedgolzar, Negar ; Külekci, Muhammed Oğuzhan ; 704181014 ; Computer Sciences

Science and social science research are crucial to the nation's long-term sustainable progress in both the social and economic spheres. The advancement of living standards and quality of life is supported by scientific and social science research advances. Many countries are rapidly transitioning to a knowledge-based economy and lessening their reliance on their natural resources as a result. This is due to the growing importance of research in a country's economic success. Research funding is crucial for the advancement of science and technology as well as for the growth of society and the economy. Bibliometric indicators are vital instruments for figuring out the extent, growth, and global distribution of research in order to recognize and evaluate its progress. Bibliometric indicators are commonly used to evaluate the scientific production, visibility, and capacity of research publications in the context of global science. These statistics are mostly based on the number of published scientific research documents and their citations. Bibliometric indicators evaluate the quantity and quality of research output, and structural indicators analyze the relationship between authors, publications, and topics of research in general science. Science and technology cannot exist unless researchers provide evidence for and publicize the results of their experiments. Keeping these factors in mind, the current study intended to create a strong and more objective ranking system for a country that can evaluate the quality and quantity of a country's research output than the existing techniques. In This thesis, we have developed an evaluation metric called the AtE ratio to evaluate a country's performance in terms of its international visibility in terms of scientific productivity. As a quantitative and qualitative indicator, all publications with at least one author from the target country, as well as the number of citations, are counted on a particular topic. The ratios of actual publications or citations to expected values (AtE), which are estimated based on the country's GDP and population size, are used to evaluate the country's international visibility. If the ratio is higher than one, the associated country performs well in comparison to its global presence. Additionally, we have created a website that allows for more flexible data processing and visualization. There was a large amount of information. Many criteria were taken into account, including the number of scientific categories, subcategories, countries, published articles, citations, and publication journals. As it was not possible to incorporate all of the outcomes in this study, with the help of this website we will be able to display and understand data for any desired period of time between 2001-2020, for any chosen custom combined factors. We calculated AtE ratios in four different approaches for eight different scientific areas and provided the top 20 countries with the greatest AtE ratios. The findings are shown in 19 figures and 8 tables. From the results, we notice that only Israel appears in all 32 categories. This signifies that, given its GDP share and population size, Israel is performing remarkably well in the output of science and has invested significantly in research. The majority of the top twenty countries also have high incomes. Nonetheless, Cyprus is the only small country among non-high-income countries to have appeared on the list 15 times, a testament to the country's tireless efforts to produce research in the best possible way. Furthermore, we also show the top ten countries with the most papers, published in all categories without any normalization. Australia, the United Kingdom, and the Netherlands are among the top 20 countries, with AtE ratios of 28, 27, and 27 times, respectively. This demonstrates that, despite having a smaller population and a lower GDP than other wealthy countries, these countries have generated science on par with those countries in terms of quantity and quality. Another significant element is that the United States was listed among the top 20 countries on multiple occasions. The actual versus expected number of publications or citations depending on GDP or population is represented by the AtE ratio. The United States has the largest GDP in the world. It is the world's third most populous country in terms of population. To be among the top 20 countries in terms of AtE ratios, a country must publish massive numbers of publications with high citations. It is astonishing that the United States has done it 20 times. Other countries with smaller GDPs than the US, such as China and India, could only appear on the list four and three times, demonstrating the vast disparity between these countries. These findings indicate the United States' vast and unequaled global power in high-quality science production. In addition, we investigated Turkey's ranking as a special case in eight scientific fields. In terms of the total number of publications, it has the highest global position of 17, which is in the areas of "Engineering and Computer Sciences" and "Health and Medical Sciences." Similarly, Turkey's AtE ratio is greater than one only in the "Engineering and Computer Sciences" and "Health and Medical Sciences" categories, where it is 1.15 and 1.06, respectively. These findings suggest that Turkey's policymakers should focus more on scientific research in order to boost the country's science production in terms of quantity and quality. Finally, it can be said that this is a new framework that makes us examine the science production of countries from a new angle by considering their GDP share and population size. Policy makers also better understand in which areas their country has weaknesses and strengths in the production of science and whether they are playing their part according to their size in the world or not.
A variational graph autoencoder for manipulation action recognition and prediction

(Graduate School, 2022-06-23) Akyol, Gamze ; Sarıel, Sanem ; Aksoy, Eren Erdal ; 504181561 ; Computer Engineering

Despite decades of research, understanding human manipulation actions has always been one of the most appealing and demanding study problems in computer vision and robotics. Recognition and prediction of observed human manipulation activities have their roots in, for instance, human-robot interaction and robot learning from demonstration applications. The current research trend heavily relies on advanced convolutional neural networks to process the structured Euclidean data, such as RGB camera images. However, in order to process high-dimensional raw input, these networks must be immensely computationally complex. Thus, there is a need for huge amount of time and data for training these networks. Unlike previous research, in the context of this thesis, a deep graph autoencoder is used to simultaneously learn recognition and prediction of manipulation tasks from symbolic scene graphs, rather than using structured Euclidean data. The deep graph autoencoder model which is developed in this thesis needs less amount of time and data for training. The network features a two-branch variational autoencoder structure, one for recognizing the input graph type and the other for predicting future graphs. The proposed network takes as input a set of semantic graphs that represent the spatial relationships between subjects and objects in a scene. The reason of using scene graphs is their flexible structure and modeling capability of the environment. A label set reflecting the detected and predicted class types is produced by the network. Two seperate datasets are used for the experiments, which are MANIAC and MSRC-9. MANIAC dataset consists 8 different manipulation action classes (e.g. pushing, stirring etc.) from 15 different demonstrations. MSRC-9 consists 9 different hand-crafted classes (e.g. cow, bike etc.) for 240 real-world images. The reason for using such two distinct datasets is to measure the generalizability of the proposed network. On these datasets, the proposed new model is compared to various state-of-the-art methods and it is showed that the proposed model can achieve higher performance. The source code is also released https://github.com/gamzeakyol/GNet.
Advanced techniques and comprehensive analysis in speech emotion recognition using deep neural networks

(Graduate School, 2024-07-01) Yetkin, Ahmet Kemal ; Köse, Hatice ; 504201506 ; Computer Engineering

The rapid advancement in artificial intelligence technologies has resulted in significant progress in human-computer interaction (HCI) and related fields. In HCI, the ability of machines to perceive and understand users' emotional states in real-time is crucial for enhancing the user experience. Accurate recognition of emotions enables machines to provide more personalized and effective services. Over the past fifty years, research on the recognition of speech and speech emotion recognition (SER) has made considerable strides, continuously expanding the knowledge base in this area. Speech is one of the fundamental elements of human communication and offers rich information about the speaker's emotional state. Changes in tone, speed, emphasis, and pitch play significant roles in reflecting the speaker's emotions. Therefore, analyzing speech can provide deeper insights into the speaker's feelings, thoughts, and intentions. It is widely accepted that the human voice is the primary instrument for emotional expression and that tone of voice is the oldest and most universal form of communication. In this context, the ability of machines to interpret these tones can greatly enhance the performance of HCI systems. Recognizing emotion from speech is a significant research area in affective computing. This task is challenging due to the highly personal nature of emotions, which even humans can find difficult to understand accurately. Speech emotion recognition has numerous practical applications, including emotion-aware HCI systems, traffic problem-solving, robotics, and mental health diagnosis and therapy. For instance, in customer service systems or mobile communication, a customer's emotional state can be inferred from their tone of voice, and this information can be used to provide better service. In educational support systems, it can help improve children's socio-emotional skills and academic abilities. Recognizing emotions from speech can also provide early warnings for drivers who are excessively nervous or angry, thereby reducing the likelihood of traffic accidents. Moreover, such systems hold great potential for individuals who struggle to express their emotions, such as children with autism spectrum disorder (ASD). This study aims to develop a method for detecting emotions from speech and to use this method to improve the performance of existing speech emotion recognition (SER) systems. In this context, various feature extraction methods have been evaluated to identify the most distinctive voice characteristics for recognizing emotions. These methods include Mel Frequency Cepstral Coefficients (MFCC), Mel spectrogram, Zero-Crossing Rate (ZCR), and Root Mean Square Energy (RMSE). The extracted features have been used in conjunction with deep learning models. Initially, these features were transformed into two-dimensional images and optimized on pre-trained networks, then trained on a one-dimensional convolutional neural network (CNN) architecture. Finally, a combined CNN and Long Short-Term Memory (LSTM) model was used. Throughout this research, critical questions were addressed, such as whether speech features can accurately detect human emotional states and which feature extraction method performs best in the literature. The study specifically examined the impact of various feature extraction methods, including MFCC, Mel spectrogram, Chroma, Root Mean Square Energy (RMSE), and Zero-Crossing Rate (ZCR). The effects of different image formats of MFCC and Mel-spectrogram audio features on accuracy rates and how these formats influence model performance were also explored. Additionally, the study aimed to determine which pre-trained model, among VGG16, VGG11\_bn, ResNet-18, ResNet-101, AlexNet, and DenseNet, performs best when fine-tuned. The impact of audio data augmentation methods on test results was evaluated, analyzing how increasing and diversifying the dataset affects the overall accuracy and robustness of the models. This research aims to address these questions to contribute to the development of more accurate and robust systems for speech emotion recognition.
Aerial link orchestration

(Graduate School, 2024-08-23) Bayram, Büşra ; Seçinti, Gökhan ; 504211548 ; Computer Engineering

Unmanned Aerial Vehicles (UAVs) have become indispensable tools due to their superior maneuverability and flexibility in a variety of activities such as mapping, infrastructure monitoring, and object tracking. Their applications are many, ranging from industrial and military surveillance to commercial delivery and other operations. Because of their hardware architectures, atmospheric factors such as wind and turbulence restrict the movement of UAVs, particularly drones. These conditions not only interfere with their responsiveness but also limit the operation of integrated systems and communication between the drone and the ground control station (GCS). It is critical in drone operations to maintain communication systems with the GCS and ensure the correct functioning of integrated systems, including managing the drone's movement parameters. These different uses, as well as the associated environmental circumstances, highlight the crucial requirement for UAVs to function dependably, as well as the importance of suitable regulations and adaptations. Drones and UAVs utilize a variety of communication methods in order to create a data link between the vehicle and GCS and sometimes between multiple aircraft (swarm technology). UAV communication systems can be utilized for data and image transmission from sensors and payloads to the control station, broadcasting telemetry systems, and command and control. Additionally, they provide bidirectional communication from air to ground and ground to air by allowing data and commands to be received at the ground station. The most common ways of drone communication employ radio-frequency (RF) signals in bands such as HF (high frequency) , satellites, cellulars, and other wireless infrastructures. However, radio technologies are the most widely used. RF datalinks can be analog or digital and have a longer range than Wi-Fi, although they are still limited to line-of-sight (LOS). The range of the UAV communications system is determined by the direction and size of the antenna, the strength of the transmitter, and the frequency, with lower frequencies allowing longer ranges but lower data rates. By addressing these technical difficulties, we develop new techniques to improve UAV communication quality and identify drone flight parameters that influence communication quality. Our goal is to create communication systems that are less impacted by these elements. Our research aims to overcome constraints in high-frequency transmission imposed by drone instability and antenna limitations. Our primary goal is to provide safe, continuous communication while greatly increasing the packet delivery ratio (PDR). We create resilient and adaptive UAV systems that can function well in a variety of dynamic operational scenarios by taking advantage of the inherent flexibility of Software Defined Radio (SDR) technology. This holistic approach encompasses proactive measures against signal interference, noise mitigation, and the management of flight-induced vibrations, harnessing SDR's configurability to meet the evolving demands of modern UAV operations effectively. Our approach involves: * Addressing Drone Flight Patterns and Aerial Conditions: We classify different aerial conditions affecting UAVs. *Enhancing the Modulation and Coding Scheme (MCS): We improve the MCS table to be aware of aerial and flight conditions. *Exhaustive Real-World Experimentation: Utilizing a "train on day, test on the next day" methodology on a real test bed. To increase drone PDR, we use Digital Twin architecture to detect influential parameters. Using the "train one day, test another day" method, we include real-world test flight log data from drones and SDR communication attributes into our digital twin model. This allows us to discover the best parameter values for getting a high PDR, which we then feed back into our system. Based on these results, we update the existing static MCS table to reflect the effect of the identified drone flying factors on communication performance. Our results validated methodologies have demonstrated significant improvements in PDR, achieving an average increase of 27\% across multiple drone platforms and environmental scenarios. These findings underscore the effectiveness of our approach in optimizing communication performance under real-world conditions. Furthermore, our research provides valuable insights into the intricate interactions between UAV flight dynamics and communication efficacy, guiding future advancements in UAV technology. In summary, our research underscores the critical importance of maintaining robust communication networks in dynamic UAV environments. By proposing and validating innovative methodologies, we lay the groundwork for enhanced UAV communication resilience and efficiency. Future endeavors will build upon these foundations, expanding system capabilities across broader operational scenarios and pushing the boundaries of UAV communication technology to new heights.
Ağ iletişimlerinde temel yenilikçi çözümlerin standartlaştırılması

(Lisansüstü Eğitim Enstitüsü, 2023-08-30) Kalkan, Muhammed Salih ; Seçinti, Gökhan ; 504191579 ; Bilgisayar Mühendisliği

Ağ iletişimlerindeki problemler oldukça eskiye dayanır. Bu problemleri çözmek için birçok çalışma yapılmıştır. Bu çalışmalar, günümüzde OSI model olarak adlandırdığımız, katmanlı bir iletişim yapısını ortaya çıkarmıştır. Bu katmanlardan birisi uygulama katmanıdır. Mesajlaşma ile ilgili problemler, bu katmana aittir. Dolayısıyla, mesajlaşma ile ilgili özellikler bu katmanda kullanılır. Bazı mesajlaşma özelliklerini standartlaştırmak için, bazı uygulama katmanı protokoller oluşturulmuştur. AMQP, MQTT vb. protokoller, uygulama katmanı protokollerine örnektir. Bu araştırmada da, temel yenilikçi çözümler uygulama katmanında değerlendirilir. Uygulamalar, mesajlaşma ile ilgili sorunları farklı şekillerde çözmektedir. Bazı özellikler uygulama koduyla, bazıları kütüphanelerle ve bazıları da protokollerle standardize edilerek sağlanır. Uygulama koduna eklenen mesajlaşma özelliklerinin her uygulama için tekrar tekrar yazılması gerekmektedir. Her uygulama için gerekli mesajlaşma özelliklerinin kodlarının tekrar tekrar yazılması, iş gücü kaybına, hata olasılığına, kodun her seferinde artan karmaşıklığına neden olur. Mesajlaşma sorunlarını kütüphane kodları ile çözmek, bu kütüphanenin diğer tüm uç noktalarla paylaşılmasını gerekli kılar. Bu nedenle mesajlaşma özelliklerinin bir protokol ile standardize edilmesi gerekmektedir. Bu çalışmada, yerel ağlarda ve IoT'de kullanılmak üzere temel yenilikçi özellikleri standartlaştırarak iş gücü kazancı sağlanması, uygulama kodunun karmaşıklığının azaltılması, çözümlerin her uç nokta için ortaklanması amaçlanmıştır. Bir protokol standardı oluşturmak için, protokollere ait özelliklerin arkaplan bilgisine ihtiyaç vardır. Bu yüzden öncelikle, ikili-metin protokoller, iletişim modelleri, merkezi-merkeziyetsiz yaklaşımlar gibi arkaplan bilgileri incelenmiştir. İkili protokoller, verileri ikili olarak ileten protokollerdir. Metin protokolleri, verileri Unicode veya ASCII olarak ileten protokollerdir. İkili protokoller, verilerin daha küçük boyutlarda iletilmesini sağladığı için performans açısından daha iyidir. Metin protokolleri, verileri daha büyük boyutlarda iletmesine karşın ikili protokollere kıyasla kolayca hata ayıklanabilir ve veriler insan tarafından okunabilirdir. Hem yüksek performans özelliği, hem verinin okunabilir olma özelliğine sahip olmak için, izleyici uç noktanın, ikili verilerin metin karşılıklarını bilmesi gerekir. Ayrıca ikili protokoller için bayt sırası (endianness) önemliyken, metin protokolleri için bayt sırası önemli değildir. Cihazın endianness tipi little-endian veya big-endian olabilir. İkili protokollerde, farklı endianness'e sahip iki cihaz iletişim kurduğunda, verilerin serileştirilmesinden önce ve verinin serisini çözümleme işleminden önce verilerin bayt adreslemesi tersine çevrilmelidir. Bu problemlerin çözümleri, uygulama katmanında standartlaştırılırsa, geliştiricilerin bu problemleri tekrar tekrar çözmeye çalışmasına gerek kalmaz. Sunucu-istemci modeli, birden fazla istemci uç noktasının tek bir sunucu uç noktasından hizmet talep ettiği bir modeldir. Yayınla-abone ol modeli, yayıncı ve abone uç noktalarının merkezi bir mesaj yönelimli ara yazılım aracılığıyla mesaj iletimlerini sağlayan bir modeldir. Uç noktalar, konulara abone olur veya mesajları yayınlar. Mesaj aracısı, yayınlanan mesajları, mesaja abone olan uç noktalara iletir. Mesaj aracısı, gevşek bağlantı ve esneklik sağlar. Uç noktalar, birbirlerinin varlığından bağımsız olarak mesajlaşmaya devam eder. Transformatörler ve filtreler, mesaj aracısı üzerinde çalışabilir. Gevşek bağlantı aynı zamanda bir dezavantajdır. Yayıncı uç noktaları, abone uç noktalarının iletişim kurup kurmadığından emin olamaz. Yayıncılar ve aboneler arttıkça, mesaj aracısını aşırı yükleyebilir. Mesaj aracısı, merkezi olduğundan darboğaza neden olabilir. Bu, yatay ölçeklenebilirliği sınırlar. İletileri doğrudan hedef uç noktalara iletmek yerine önce mesaj aracısına iletmek gecikmeyi artırır. Mesaj aracısı ile gelen bu problemlerden kurtulmak için, merkezi olmayan yayınla-abone ol modeline ihtiyaç vardır. Mesajlaşan uç noktalar için en büyük sorunlardan biri, uç noktalardan birinde mesaj yapılarının güncel olmaması veya yanlış implement edilmiş olmasıdır. Mevcut mesajlaşma protokolleri için, bir bağlantıdaki uç noktaların mesaj yapılarının uyumluluğunu kontrol etmeye yönelik standart bir yaklaşım yoktur. Bir iletişimde giden ve gelen mesajları izlemek kritik olabilir. Mesaj gönderme noktadan noktaya ise, üçüncü bir izleme uzak uç noktası iletişime dahil edilemez. IP paket başlığındaki hedef IP adresi, noktadan noktaya iletişim için tek bir cihaza ait olmalıdır. Bu problem, uygulama katmanında üçüncü uzak noktalara yönlendirme yapılarak çözülebilir. Birçok uygulama katmanı protokolü, taşıma katmanındaki bir protokole bağlıdır. Bu da gelecek kullanımları kısıtlayabilir. Örneğin, QUIC protokolü, TCP'nin yerini aldığını varsayalım. Artık TCP implementasyonlarının ortadan kalktığını varsayalım. Bu durumda, düzinelerce TCP tabanlı protokolün yeni bir sürümle QUIC tabanlı olması gerekecektir. Bu yüzden alt protokollerden soyutlanmak, gelecek kullanımlar için önemlidir. Birden çok protokol kullanmak için birden çok iletişim arabirimi oluşturulmalıdır. Ancak bir protokol, çoklu alt katman protokol ile kullanılabilir olma özelliğine sahip ise, tek bir iletişim arabirimi yeterli olacaktır. Bu çalışmada, mevcut protokollerin, bu sorunları ne kadar çözdüğüne dair veriler toplandı. Bu sorunları çözen özellikler ile mevcut protokolleri kullanarak bir tablo oluşturuldu. Diğer uygulama katmanı protokollerinin tüm bu özellikleri desteklemediği görülmektedir. Bu nedenle, bu özellikleri sağlayan yeni bir protokole ihtiyaç vardır. Bu protokolün adı mesajlaşma kontrol protokolüdür (MCP). MCP'nin hedeflediği kullanım alanı daha çok yerel ağ iletişimleridir. MCP, daha çok yerel ağ iletişimleri, asenkron iletişimler, non-stateless iletişimler ve gömülü sistemlerde kullanılabilecek özelliklere yoğunlaşmıştır. MCP'nin alt katman protokollerinden bağımsız olması için ve çoklu alt protokollerle kullanılabilmesi için MCP'nin iki bileşeni vardır: MCP Adaptörü ve iletişim arayüzü. MCP Adaptörü, MCP'nin ön koşullarını sağlamak için gereklidir. Alt protokollerin işlevlerini kullanmak için iletişim arayüzü gereklidir. Böylece MCP alt protokollerden bağımsız hale gelir ve birden fazla alt protokol ile kullanılabilir. MCP'de iki mesaj sınıfı vardır: MCP Standart Mesajı, MCP Uygulama Mesajı. MCP, MCP standart mesajları olarak adlandırılan, uygulama kodundan bağımsız yerleşik mesajlara sahiptir. 5 tür standart mesaj vardır: El Sıkışma Mesajı, Kalp Atışı Mesajı, Rol Başvuru Mesajı, Abone Olma Mesajı, Abonelikten Çıkma Mesajı. İstemciler, kullanıcı tanımlı mesajların yapılarını el sıkışma istek mesajı ile JSON formatında gönderir. Böylece uç noktaların mesaj uyumlulukları kontrol edilir. Sunucu, endianness tipini el sıkışma yanıt mesajı ile gönderir. İstemci, sunucunun endianness tipini öğrenir. İstemci ve sunucunun endianness türleri farklıysa, istemci verilerin bayt sıralamasını otomatik olarak değiştirir. Bağlantının canlı olup olmadığını tespit etmek için periyodik olarak kalp atışı mesajı gönderilir. Bir istemci, bir mesaja abone olmak için ya da bir mesajın aboneliğinden çıkmak için Abone Olma Mesajı ve Abonelikten Çıkma Mesajını kullanır. MCP uygulama mesajları, uygulama kodunda tanımlanan mesajlardır. Dört tür uygulama mesajı vardır: İstek-Yanıt Mesajı, Olay Mesajı, Başlangıç Mesajı, Rapor Mesajı. İstek-yanıt mesajları için, yalnızca ilgili istek mesajı alındığında ilgili yanıt mesajı oluşturularak iletişim sağlanır. Olay mesajları, bir olayın tetiklenmesi ile iletilir. Olay mesajları tüm bağlı abone istemcilerine gönderilir. Başlangıç mesajı, aslında bağlantı kurulduğunda tetiklenen bir olay mesajıdır. Rapor mesajı, aslında zamana göre tetiklenen bir olay mesajıdır. Yetkilendirme için rol tabanlı erişim kontrol yöntemi kullanılır. İstemcilerin MCP bağlantısında rolleri vardır. İstemcilerin rolleri, mesajlaşma arayüzündeki mesajların erişilebilirliğini belirler. Sunucu, her mesaj için hangi istemci rollerinin erişebileceğini belirler. Rollerin istemcilere atanmasını ise, admin rolündeki istemci gerçekleştirir. Noktadan noktaya iletişimde mesajları izlemek isteyen istemcilerin rolü, izleme rolüdür. İzleyici rolü, iletilerin erişilebilirliğinden bağımsızdır. Noktadan noktaya iletişimdeki tüm mesajlar monitör istemcisine iletilir. İzleme istemcisi, iletişime katılmak için bir bağlantı isteği gönderir. Monitör, bağlantı kurma aşamasında el sıkışma mesajı ile mesaj yapılarını alır ve iletişimdeki ikili verilerin metin karşılıklarını öğrenir. Böylece veriler ikili olarak iletilse de, metin olarak görüntülenebilir. Uygulama katmanında oluşturulan MCP protokolü, mesajlaşma problemlerini protokol kodunda çözerek problemlerin çözümünü standardize eder. Diğer uygulama katmanı protokolleri, MCP'nin çözdüğü tüm sorunları çözemez. Bu nedenle, MCP fark yaratır. MCP kullanılırsa, bu çalışmada belirtilen çözümlerin uygulama kodunda olmasına gerek kalmaz. Böylece uygulama kodunun karmaşıklığı azaltılmakta ve mesajlaşma özelliklerinde oluşabilecek hatalar ortadan kaldırılmaktadır. MCP sadece mesajlaşma için birçok özellik sunmakla kalmaz, aynı zamanda performansa da önem verir. Performans için, MCP dinamik başlık boyutunu kullanır ve MCP ikili protokoldür. MCP, temel mesajlaşma problemlerine odaklandığı ve performansı önemsediği için yerel ağların yanında IoT'ye de uygulanabilir. Gelecekte IoT alanında MCP'nin kullanılabilmesi için analizler yapılabilir. Sonuç olarak, MCP yenilikçi temel mesajlaşma özellikleri sağlar, bu özellikleri standardize ederek hata olasılığını azaltır ve uygulama kodunun karmaşıklığını azaltır.
Ai-powered web application security mechanisms

(Graduate School, 2024-12-11) Demirel Yılmazer, Dilek ; Sandıkkaya, Mehmet Tahir ; 504172515 ; Computer Engineering

In the current era of widespread digitalization, the volume of processed private and sensitive data has significantly increased due to the adoption of web-based applications. With this expansion, the need for robust cybersecurity measures to protect against external threats has grown immensely. Corporate networks traditionally served as a barrier to prevent direct access from the Internet, but attackers are targeting web application servers, which are the main points of contact for end users. Thus, this thesis presents AI-based mechanisms for protecting sensitive information of companies as they rely on web-based applications for data storage and exchange. As web application security becomes a top concern across industries, high-performance computing and intelligent solutions are needed to analyze and comprehend vast amounts of web application logs. Machine learning, a branch of artificial intelligence, emerges as a key technique to address these issues. Machine learning is ideal for identifying and evaluating web-based attacks since it allows computers to learn from data and predict results. The thesis explores how machine learning techniques such as regression, prediction, and classification effectively resolve common web application security problems. Researchers have found applications in network management and operation, resource optimization, security analysis, and user profiling. Additionally, zero-shot learning, a technique commonly associated with natural language processing and computer vision, is proposed as a promising approach in web application security for detecting previously unseen attacks. This thesis presents AI-powered web application security mechanisms that lay the groundwork for the threat detection capabilities of ML. It focuses on malicious web requests and web session detection using supervised and unsupervised approaches and makes three major contributions. First, this thesis introduces the Zero-Shot Learning approach using a Convolutional Neural Network (ZSL-CNN), which effectively tackles high false positive rates and unbalanced data issues encountered during ML-based web application attack detection. The approach is evaluated using five distinct web request datasets, and the ZSL-CNN model outperforms other models with a remarkable true positive rate. Second, this thesis presents an innovative approach that uses machine learning-based classification to detect malicious web sessions. This technique combines an embedding layer with machine learning algorithms and demonstrates superior accuracy compared to benchmark methodologies. Finally, this thesis introduces another innovative approach that combines unsupervised learning methodologies. This approach, which focuses on web-based session security, employs two unsupervised learning algorithms to efficiently discriminate benign sessions from malicious sessions for a web application. This thesis presents a comprehensive investigation of the intersection of machine learning and web application security in the digital age, providing valuable insights and innovative solutions for protecting web applications.
Akademik hukuk makalelerinde atıf önerisi

(Lisansüstü Eğitim Enstitüsü, 2023-06-22) Arslan, Doğukan ; Eryiğit, Gülşen ; 504201515 ; Bilgisayar Mühendisliği

Hukuk ve Doğal Dil İşleme çalışmalarının kesişiminde, hukuki metinlerin anlaşılması, işlenmesi, yorumlanması ve üretilmesi gibi konulara odaklanan "Hukuki DDİ" çalışmaları yer alır ve bu çalışmalar farklı hukuki metin türleri üzerinde çeşitli alt görevlere odaklanmaktadır. Bu çalışmalardan biri de Atıf Öneri görevidir. Atıf Önerisi, bilimsel makalelerde belirli bir metin için potansiyel atıfların belirlenmesi çalışmalarını kapsar. Ancak, bu görevdeki çalışmalarda, veri kümelerinin alan bazında yeterince kapsayıcı olmaması ve alanlara dengesiz dağılması gibi sorunlar genellikle ihmal edilmektedir. Son zamanlarda yapılan bir çalışmada, bu sorunlar ele alınmış ve farklı alanları kapsayan yeni bir veri kümesi oluşturulmuştur. Ancak, hukuk gibi bazı temel alanlar hala bu tür çalışmaların dışında kalmaktadır. Bu nedenle, Atıf Önerisi gibi alt görevlerde bile, büyük veri kümeleriyle eğitilen dil modelleri, alan bazında eksiklikler gösterebilmektedir. Hukuki Doğal Dil İşleme bağlamında Atıf Önerisi, çoğunlukla mahkeme kararları gibi bilimsel olmayan hukuki metinlerden, var olan argümanları gerekçelendirmek için çeşitli atıfların elde edilmesini amaçlar. Hukuk sistemleri, Ortak Hukuk ve Kıta Avrupası Hukuk sistemi olmak üzere iki ana kategoriye ayrılabilir. Ortak Hukuk sistemine sahip ülkelerde, kararların sonuçları geçmiş davaların incelenmesiyle belirlenir ve bu nedenle kararlar arasında çok sayıda atıf bulunurken, Kıta Avrupası Hukuk sistemine sahip ülkelerde karar verme süreci daha çok olgusal kanıtlar ve ilgili kanun maddelerine dayanır. Bu da kararların kanunlara ve tüzüklere daha fazla atıf içermesine yol açar. Her iki sistemde de hukuk uygulayıcıları için emsal kararları bulmak önemlidir, ancak bu süreç zaman alıcı olabilir. Türkiye'de Yargıtay tarafından yayınlanan 7 milyondan fazla karar bulunmaktadır ve avukatlar, ilgili içtihatları aramak için önemli miktarda zaman harcamaktadır. Hukuki Atıf Önerisi görevinin halihazırdaki önemi ve faydaları, akademik hukuk metinlerinin gereken ilgiyi görmemesi ve görev kapsamına alınmamasıyla sonuçlanmıştır. Bununla birlikte, bilimsel makalelerden otomatik olarak atıf bilgisi çıkarılarak elde edilecek olan işaretli veri ile, etiketli veri oluşturmanın maliyetli olduğu Hukuki Doğal Dil İşleme görevleri için önemli bir kaynak oluşturulabilir. Bu yaklaşım, Atıf Önerisi görevinin yanı sıra emsal karar bulma, hukuki belge benzerliği ve hukuki karar tahmini gibi diğer görevlerde de etkili olabilir. Bu şekilde, akademik hukuk metinleri daha verimli bir şekilde kullanılarak daha iyi performans gösteren dil modelleri geliştirilebilir. Ayrıca, diğer bilimsel alanlardan farklı dilbilimsel özelliklere sahip olan hukuki metinler için özel bir ilgi gerekir. Geleneksel Atıf Önerisi görevinden ayrışan Hukuki Atıf Önerisi, bu özellikleri anlayabilen ve etkili atıf önerileri sunabilen dil modellerine ihtiyaç duyar. Bilimsel yayıncılığın hızlı genişlemesiyle birlikte, atıfların güvenilirliği ve kalitesiyle ilgili endişeler ortaya çıkmış ve Atıf Önerme görevi zaman içinde önem kazanmıştır. Bu görev kapsamında işbirlikçi filtreleme, çizge temelli filtreleme ve içerik temelli filtreleme gibi yöntemler kullanılmaktadır. Farklı metin türleri, haberlerden patentlere ve yargı kararlarına kadar, Atıf Önerme görevinde kullanılmıştır. Görev, önerinin kapsamına bağlı olarak da genellikle yerel ve küresel olmak üzere iki ana kategoriye ayrılır. Çeşitli akademik makale veri kümeleri, Atıf Önerme tekniklerinin geliştirilmesi ve test edilmesi için kullanılmıştır. Atıf Önerİ yöntemleri, akademik olmayan hukuki metinleri (mahkeme kararları, tüzükler, atıfta bulunulan yasalar vb.) tespit etmek amacıyla hukuk alanına uyarlanmaktadır. Bu uyarlamalar, Hukuki Atıf Önerme görevi adı altında gerçekleştirilmektedir. Tez kapsamında Hukuki Atıf Öneri görevi için, akademik hukuki makalelerden oluşan bir veri kümesi toplanmıştır. Bu veri kümesi, Atıf Önerme ve ilgili görevlerde iyi performans gösteren veya hukuk alanında eğitilmiş toplamda yedi farklı modelin test edildiği dört farklı deney düzeninde kullanılmıştır. Gerçekleştirilen deneylerde, yedi farklı model için dört farklı deney düzeni kullanılarak, önceden eğitilmiş modellerin doğrudan kullanılması, modellere ince ayar yapılması ve BM25 ile ilgili makalelerin çekilmesiyle birlikte yeniden sıralanması üzerinde çalışmalar yapılmıştır. Benimsenen iki aşamalı yaklaşım, dil modellerinin hantallığını azaltmak için BM25 gibi daha hızlı ancak daha az doğruluk gösteren modelleri kullanarak makale örneklerini hızlı bir şekilde seçmeyi amaçlar. Bu yaklaşım, bilgi getirimi çalışmalarında sistem etkinliğini artırmak için sıkça kullanılır. İlk aşamada, hızlı modellerle ilgili belgelerin örneklerini alırken, daha sonra yavaş ancak daha doğru olan modellerle bu aday makaleler yeniden sıralanır. İngilizce hukuki atıf önerme görevi için LawArXiv adlı hukuki bilimsel makaleler veritabanından makaleler indirilmiştir. Bu veritabanı, 1366 bilimsel hukuki makaleye sahip olan ve çeşitli hukuki konuları kapsayan bir kaynaktır. Makalelerin atıf yapılan kaynakları elde etmek için Google Scholar kullanılmış ve 10 binden fazla atıf içeren makale elde edilmiştir. Elde edilen makalelerin öz kısmı pdfplumber adlı bir Python paketi ile çıkarılmış, ardından başarılı bir şekilde çıkarılan İngilizce makaleler seçilmiştir. Ön işleme adımlarıyla makaleler düzenlenmiş ve öz kısımları çıkarılmıştır. Deneylerde 719 LawArXiv makalesi ve 8,887 atıf içeren 10,111 atıf bağlantısı içeren bir veri kümesi kullanılmıştır. Makalelerin öz kısımları, benzer içerik temelli küresel atıf önerme çalışmalarıyla uyumlu bir şekilde, ince ayar, temsil elde etme ve test aşamalarında girdi olarak kullanılmıştır. Veri kümesi, eğitim ve test olarak ayrılmış olup, verilerin %70'i eğitimde kullanılmış ve kalan %30'u test için ayrılmıştır. İnce ayar aşamasında üçlü kayıp fonksiyonu kullanılmıştır. Bu fonksiyon referans girdiyi (çapa) pozitif bir girdiyle (benzer) ve çapayla eşleşmeyen negatif bir girdiyle karşılaştırır. İnce ayar ve temsil elde etme adımlarından sonra, belge temsil vektörleri vektör uzayında benzerliklerine göre sıralanmıştır. Tüm eğitim ve test süreçlerinde Sentence-Transformers çerçevesi kullanılmıştır. Deneylerin sonuçları, bilgi getirimi çalışmalarında yaygın olarak kullanılan üç farklı metrik olan Mean Average Precision (MAP) (Ortalama Kesinliklerin Ortalaması), Recall (Duyarlılık) ve Mean Reciprocal Rank (MRR) (Sıralamaların Terslerinin Ortalaması) kullanılarak sunulmuştur. Bu metrikler, bir makalenin ortalama olarak 14 atıf bağlantısına sahip olduğu göz önüne alınarak, getirilen ilk 10 belge için (n=10) raporlanmıştır. Önceden eğitilmiş çeşitli modeller ve derlenen veri kümesi eğitilmiş BM25 modelinin karşılaştırması, SciBERT'in diğer modellere kıyasla en düşük performansı gösterdiği, Law2Vec ve LegalBERT gibi hukuki derlemlerle eğitilen modellerin atıf önerme görevinde başarısız olduğu, SGPT'nin ise SPECTER ve SciBERT'ten daha iyi performans gösterdiği ancak BM25'in en başarılı model olarak öne çıktığı sonucunu ortaya koymuştur. Bu sonuçlar, literatürdeki bilimsel alan temelli Atıf Önerme çalışmalarıyla da uyumludur. Önceden eğitilmiş modellere ince ayar yapıldığında elde edilen sonuçlar incelendiğinde, modellerin genel olarak benzer performans sergilediği ancak BM25'i geçemediği görülmektedir. Bununla birlikte, ince ayarlı LegalBERT modelinin performansının önemli ölçüde arttığı, modelin göreve aşinalığının alan bilgisiyle birleşmesinin performansı artırdığı gözlemlenmiştir. En başarılı modeller arasında SciNCL ve SciBERT öne çıkmaktadır, SciBERT'in performansındaki sıçrama dikkat çekicidir. Önceden eğitilmiş modellerin sıralama yeteneklerini BM25'in geri getirme kapasitesiyle birleştiren deneylerin sonuçları önceden eğitilmiş modellerin BM25'in performansını artıramadığını gösterse de, SciNCL'nin tartışmasız olarak en başarılı model olduğunu ortaya koymaktadır. BM25 ile getirilen makalelerin ince ayarlı modellerle yeniden sıralanması sonucunda, tüm ince ayarlı modellerin BM25'in performansını artırdığı gözlemlenmekte olup, SciNCL'in diğer deneylerle uyumlu olarak en başarılı model olduğu görülmektedir (0.30 MAP@10). Bu çalışmada, İngilizce Hukuki Atıf Önerisi veri kümesi oluşturulmuş ve Atıf Önerisi görevinde başarılı modeller ile alana özel eğitilmiş modellerin performansları karşılaştırılmıştır. Ayrıca, iki aşamalı bilgi getirme yöntemi kullanılmıştır. Sonuçlar, öne sürülen hipotezlerin doğruluğunu desteklemektedir. Dil modellerinin Hukuki Atıf Önerisi görevinde başarılı olabilmesi için akademik hukuk makalelerine yer verilmesi gerektiği ortaya çıkmıştır. Aynı şekilde, hukuki dokümanlarla eğitilen modellerin daha kapsayıcı olabilmesi için akademik hukuk makalelerinin de eğitim veri kümesinde bulunması gerektiği gösterilmiştir. İki aşamalı bilgi getirme yöntemi, büyük dil modellerinin ve BM25'in en iyi yönlerini birleştirerek genel performansı artırmaktadır. BM25 ile SciNCL'in birlikte kullanılması, Hukuki Atıf Önerisi görevinde en başarılı sonuçları vermektedir. Gelecek çalışmalar açısından, iki aşamalı bilgi getirme yöntemi önemli bir araştırma alanıdır. Ayrıca, elde edilen Hukuki Atıf Önerisi modelinin farklı hukuki görevlere uygulanması ve başarımlarının test edilmesi önemlidir. Veri kümesinin boyutunu artırmak için çeşitli çalışmalar da yapılabilir. Özellikle veri kümesi büyüdükçe, BM25'in hızı ve performansı daha iyi değerlendirilebilir.
An empirical investigation on improving fairness testing for machine learning models

(Graduate School, 2024-02-06) Karakaş, Umutcan ; 504211534 ; Computer Engineering

The usage of machine learning has become a more common practice in our lives, so the effects of machine learning can be seen in various sectors such as healthcare, finance, entertainment, and commerce. Thus ML models started taking more crucial roles in influencing decisions and molding experiences. However, the power of machine learning doesn't come without challenges, especially with the issues with fairness. Bias in machine learning systems can skew results, leading to potential inaccuracies or injustices. For instance, a recruitment system might, due to historical data biases, favor one demographic over another, inadvertently perpetuating gender or ethnic disparities. Similarly, a healthcare diagnostic tool might provide unreliable results for certain racial groups if the data it's trained on doesn't account for diversity. Such examples of unfair machine learning behaviors show the crucial need for fairness in these systems. Previous approaches for improving the fairness performance of ML models have focused on the detection and correction of a wide range of data points scattered all over the feature space which generally leads to unrealistic or extreme cases. However, this method has flaws, focusing on those extreme data points can result in missing more common fairness issues, which makes the approach less effective. RSFair is a new approach that shifts the focus from unrealistic or extreme cases to more representative and realistic data instances. This technique aims to detect more common unfair behaviors with the idea that understanding and removing bias in common scenarios will solve the majority of fairness problems in return. In the methodology of RSFair, two primary techniques are employed: Orthogonal Matching Pursuit (OMP) and K-Singular Value Decomposition (K-SVD). These methods are used for sampling a representative set of data points out of a large dataset while keeping its essential characteristics. OMP reconstructs the dataset by the selection of the atom from the dictionary which is the most correlated with the goal signal. This dictionary doesn't includes every single element from the original dataset. Instead, it uses a strategic compilation of atoms that, when combined, represents the full scope of the original dataset. This can also be thought of as trying to recreate the original data set with minimum error, while this error will be reduced and optimized in the K-SVD process by updating the dictionary atoms. This process involves a careful and systematic approach, ensuring that the most representative data points are selected for the dictionary. K-SVD, on the other hand, continually refines the dictionary. It does this through an iterative process, where the dictionary is updated after each cycle. Each iteration aims to optimize the dictionary further, reinforcing its accuracy and reliability as a smaller mirror of the larger dataset. In the RSFair method, OMP and K-SVD are not standalone processes but are collaborative and complementary. The initial dictionary creation by OMP is crucial as it establishes a solid foundation. Still, it's the continuous optimization through K-SVD that ensures this foundation remains robust and reflective of the original dataset. In this study, we've focused on two main research questions: RQ1. How effective is RSFair in finding discriminatory inputs? RQ2. How useful are the generated test inputs to improve the fairness of the model? Addressing the first question, we decided to try out OMP and K-SVD, for creating a representative sampling. to use for discriminatory point detection. This facilitated a comprehensive comparison of RSFair's performance relative to the AEQUITAS and random sampling methodologies. As for the second question, we utilized the discriminatory points uncovered during the search phase to improve the fairness of the initial model. This procedure was replicated for AEQUITAS, random sampling, and RSFair, for the comparative analysis of the outcomes. The introduction of RSFair represents a meaningful advancement in efforts to enhance fairness in machine learning outcomes. By turning attention away from the extreme cases and considering common problems, it's possible to achieve a better understanding of how bias influences these systems.
An online network intrusion detection system for DDoS attacks with IoT botnet

(Graduate School, 2022-05-23) Aydın, Erim ; Bahtiyar, Şerif ; 504181513 ; Computer Engineering

The necessity for reliable and rapid intrusion detection systems to identify distributed denial-of-service (DDoS) attacks using IoT botnets has become more evident as the IoT environment expands. Many network intrusion detection systems (NIDS) built on deep learning algorithms that provide accurate detection have been designed to address this demand. However, since most of the developed NIDSs depend on network traffic flow features rather than incoming packet features, they may be incapable of providing an online solution. On the other hand, online and real-time systems either do not utilize the temporal characteristics of network traffic at all, or employ recurrent deep learning models (RNN, LSTM, etc.) to remember time-based characteristics of the traffic in the short-term. This thesis presents a network intrusion detection system built on the CNN algorithm that can work online and makes use of both the spatial and temporal characteristics of the network data. By adding two memories to the system, with one of them, the system can keep track of the characteristics of previous traffic data for a longer period, and with the second memory, by keeping the previously classified traffic flow information, it can avoid examining all of the packets with the time-consuming deep learning model, reducing intrusion detection time. It has been seen that the suggested system is capable of detecting malicious traffic coming from IoT botnets in a timely and accurate manner.
Anomaly detection scenarios in cyber-physical systems

(Graduate School, 2023) Sayın, Ayşe ; Sandıkkaya, Mehmet Tahir ; 814978 ; Computer Engineering Programme

Increasing complexity of the systems' management in the modern digital era has prompted the creation of cyber-physical systems. Physical and computerised components are put together in cyber-physical systems to monitor and actuate the physical environment. Cyber-physical systems acquire data, interpret it, and respond in real-time or asynchronously. Physical and digital elements are combined in cyber-physical systems, resulting in a dynamic environment that is vulnerable to a variety of hazards and disturbances. Various factors, including threats, attacks, system problems, environmental changes, and human mistake may cause anomalies in a cyber-physical system. The demand for managing cyber-physical systems brings out the importance of anomaly detection and decision-making processes. It is not feasible to show humans each data and possibility to find an irregular behaviour and control the cyber-physical systems. Henceforth, the necessary calculations for the human to decide is done by the anomaly detection process. After finding the abnormalities, it offers multiple choice options for the human to make the right choice. Herein, human interaction is added to the cyber-physical systems especially where the decision is critical for the system. Adding human interaction into the system brokes the autonomy of the cyber-physical systems. Anyhow, cyber-physical systems still have an infrastructure to perform the humans' decisions autonomously. And so forth, managing large and complex cyber-physical environment became easier. Therefore, in this study anomaly detection and reasoning processes are developed by including human-in-the-loop property in cyber-physical systems. A computer-based irregularity finding method is called as anomaly detection. Anomaly detection processes aid decision-makers in the analysis of complicated data and offers suggestions for action. In this study, anomaly detection mechanism is used to assist in different cyber-physical systems by evaluating data from both physical and cyber components to support decision-making processes and to control cyber-physical environment. This study allows humans to concentrate on only the detected anomaly and decision-making point by facilitating the computation steps while rest of the system works autonomously. Therefore, anomaly detection is used to aid in locating possible vulnerabilities of cyber-physical systems as well as making suggestions for reducing the risks involved. Thus, safety critical systems are aimed to made safer. Decision-makers can enhance their decision-making techniques and reduce potential hazards in the systems' environment by utilizing the irregularity detection process in cyber-physical systems. Therefore, anomaly detection is utilized in cyber-physical environments to anticipate and avoid from possible future predictive maintenance. An anomaly detection system utilizes a variety of strategies to reach an outcome or spot an abnormality. Varied architectures and types of decision support are emerged due to the numerous methods for carrying out the processes of abnormality detection and decision-making. A cyber-physical monitoring and control system may benefit from a rule based deterministic method. On the other hand, it may utilize a nondeterministic black-box method using machine learning or artificial intelligence. Additionally, it may apply clustering techniques, analytical methods or statistical interpretations/computations. In this study, two separate anomaly detection and reasoning prototypes are constructed for two distinct cyber-physical systems to find abnormalities and to help the decision-making process. In the first prototype, an anomaly detection method is proposed in a smart grid. Smart grids are modern electrical networks that employ cutting-edge technology to boost the effectiveness, dependability, and sustainability of electricity generation, transmission, and distribution. Bidirectional communication between energy suppliers and customers is made possible by these grids, enabling for more effective and efficient control of energy supply and demand. Digital technologies are used by smart grids to monitor and control the flow of power. These technologies include sensors, smart meters, and other gadgets that can interact to energy suppliers as well as to each other. Smart grids can decrease waste and improve energy use by continuously monitoring energy use. To monitor the smart grid, an anomaly detection process is proposed in this study. The developed solution detects concept drift in the monitoring data to prevent stealthy attacks in case of a line outage in the smart grid. Also, proposed solution uses machine learning-based supervised algorithms to estimate the behavior of a particular smart grid. The model suits the expected behaviors of the smart grid. Therefore, smart grid's expected behaviors can yield accurate results. As a consequence, historical smart grid data is used to run an anomaly detection system using machine learning techniques. At the network level of smart grid systems, forecasting harmful activity and discovering breaches have been investigated. These harmful actions include false data injection attacks that compromise the accuracy of the sensor network data that has been gathered. The models gives results with high accuracy to monitor the expected behaviours and detect consistencies in a previously known smart grid. The second prototype suggests an anomaly type detection and decision-making system for water quality management systems. A system for managing the quality of the water in a particular region is called a water quality management system. The system entails a number of operations, including monitoring water sources, identifying possible pollutant sources, setting up treatment procedures, and putting policies into place to maintain or enhance water quality. Water quality management systems monitor water quality, which entails routinely testing water from sources. In this step, any changes in water quality and probable pollution sources can be found with the use of the gathered data and a anomaly type detection mechanism. Therefore, the second prototype uses rule-based deterministic technique to operate on a network of rivers to observe the condition of the river. A generic model of a river network is developed to execute the suggested anomaly type detection solution on the gathered or measured data. The model is then updated with measurement values so that calculations can be done on them. The use of deterministic computing is suggested as a way to identify discrepancies between measurement values and anticipated results and gain insight into the behavior of rivers.
Anthropometric measurements from images

(Graduate School, 2023-07-18) Ertürk, Rumeysa Aslıhan ; Kamaşak, Mustafa Ersel ; Külekci, Oğuzhan ; 504201535 ; Computer Engineering

In this work, a system that simultaneously estimates several anthropometric measurements (namely, height and the circumferences of the bust, waist, and hip) using only two 2D images of a human subject has been proposed and tested. The proposed system has two components: a customized camera setup with four laser pointers and image analysis software. The camera setup includes an Android smartphone, four laser pointers around the smartphone's camera, and a tripod carrying the hardware. The image analysis software is a web-based application that has not been publicly available. The application takes the images as input, processes them, and yields the aforementioned anthropometric measurements in the unit of centimeters. The pipeline of the proposed system has the following components: 1. Feeding the images to the software, 2. Determining the locations of the body parts that will be measured, 3. Calculating the width of the body part on the specific location in both images (anterior and lateral), 4. Transforming pixel widths into physical units, 5. Estimating the circumference of the body part (or the height). For determining the locations of the body parts that will be measured, the software model applies pre-trained pose estimation and body segmentation models to both input images. For pose estimation, the MediaPipe framework, a tool developed for constructing pipelines based on sensory data, has been used. For body segmentation, BodyPix 2.0 in TensorFlow, a powerful tool that can perform whole-body segmentation on humans in real time, has been adopted. With the help of these models, body parts to be measured has been located on the input images. The width of a body part is measured as the largest distance between the left and right sides of the specific body part on the image. Laser points attached to the camera are leveraged while transforming pixel widths into physical units (i.e., centimeters). The last step of the measurements is converting the width into circumference. It is assumed that the cross-sectional areas of the body parts that are focused on in this research, namely, the bust, waist, and hip, are elliptical, and the circumferences of these body parts correspond to the perimeters of these ellipses. With the axes of the ellipses in hand, it is possible to estimate these anthropometric measurements. In order to evaluate the performance of the model, experiments were done on 19 volunteer human subjects. The actual measurements of these subjects were collected with traditional manual methods. The results obtained from the proposed model were compared with the actual measurements of the subjects, and the relative percentage errors were evaluated. The proposed hardware is a developed version of the prototype that was designed to assess the validity of the idea. The experiments described in this work, include the previous version of the proposed camera setup for better analysis and comparison. During the image collection stage of the experiment, the subjects that participated in the experiments are photographed with both versions of the camera setup, and the images are processed with software that is calibrated for individual camera setups. Finally, collected images are fed to a commercially available system that creates 3D meshes of humans from 2D images. This product can estimate body measurements from these meshes. For comparing the proposed system to a commercial product, this tool is included to the experiments. The images collected from the subjects who participated in the experiment are processed with the three systems mentioned earlier: the initial prototype, the improved version, and the commercially available tool. The results show that the initial prototype's relative errors for the bust, waist, and hip circumferences and height are 7.32%, 9.7%, 7.12%, and 5.0%, respectively. For the improved version, the errors become 15.97%, 9.92%, 2.01%, and 4.43%. The commercial product included in the study has relative errors of 7.8%, 10.69%, 12.43%, and 3.33% for the aforementioned body measurements. The main advantage of the proposed system over the alternative automatic methods is that, unlike the state-of-the-art measuring techniques, our method does not require predefined environmental conditions such as a specific background, a predetermined distance from the camera, or some clothing constraints. The lack of these restrictions makes the proposed system adaptable to various conditions, such as indoor and outdoor environments. The target user profile for this application would be medical practitioners, personal trainers, and individuals who want to keep track of their weight-loss progress since the system is lightweight, easy to use, and adaptable to various environments.
Automatic gaze detection for child-robot interaction

(Graduate School, 2024-07-01) Bölük, Nursena ; Köse, Hatice ; 504211525 ; Computer Engineering

Gaze behavior is a powerful, nonverbal form of communication. In social interactions, gaze is an essential indicator of attention. For individuals with autism, gaze behavior is especially critical. Because individuals with autism tend to avoid looking at the face, for this reason, individuals with autism face significant difficulties in social interactions. Eye estimation technology from scene images was used to get as natural results as possible when detecting the gaze of individuals with autism and to ensure adaptability to every environment. This study focuses on developing an Automatic Gaze Detection (AGD) system to detect two-dimensional (2D) gaze target points and relevant areas in interactions between children with autism and robots. Two new datasets, ChildPlay-R and EMBOA-Gaze, were used to develop the AGD system within the scope of this study. The ChildPlay-R dataset contains videos of children with and without autism interacting with adults and two-dimensional gaze target points of children. The ChildPlay-R dataset was created by labeling those with an environment similar to the EMBOA dataset as autism and non-autism, based on the open-source ChildPlay Gaze dataset. There are fifteen videos in the ChildPlay-R dataset, five of which are autism and ten are non-autism. The EMBOA-Gaze dataset contains robot-assisted therapy videos containing interaction games of children with autism, 2D gaze target points, and the regional equivalents of these points ("Robot," "Therapist," and "Other"). The EMBOA-Gaze dataset includes eight children with autism (six males and two females) and two typically developing children (two males) between the ages of five and ten. Since the children moved frequently during the sessions, a fisheye camera was used to capture scene images. The EMBOA-Gaze dataset is part of the EU Erasmus+ funded EMBOA project. The aim of the EMBOA project is to the enhancement of social robot intervention in children with autism with effective computing technologies. Two different people labeled one of the sessions in the EMBOA-Gaze dataset. Cohen's Kappa analysis was used to find the level of agreement between these labels. The analysis yielded a Kappa score 0.695, indicating strong agreement between labels. The AGD system was created from four main components: Head Detection Module, Adaptive Region Detection Module, Customized Spatio-Temporal Gaze Architecture (C-STGA), and Region Class Assignment Module. The Head Detection Module is trained with YOLOv8 to find the head with all its features, including hair. The Adaptive Region Detection Module has also been trained in YOLOv8 to be adaptable due to the mobility of the actors corresponding to the regions. The C-STGA module is a finely tuned version of the Spatio-Temporal Gaze Architecture (STGA) model with modified layers. Region Class Assignment Module assigns the detected region to the area to which it belongs. In order to achieve higher success on the EMBOA-Gaze dataset, training was first done with the ChildPlay-R dataset. Thus, GazeFollow and Attention Target Detection weights, trained on adults' gaze data, were also trained with the ChildPlay-R, a child gaze dataset, to enable correct detection of the gaze of children with autism. For the C-STGA module in the AGD system to reach the optimum state, the effect of initializing the models with three different weight configurations, GazeFollow, Attention Target Detection, and ChildPlay-R, was evaluated. The models were performed on the ChildPlay-R and EMBOA-Gaze datasets, and both direct testing and post-training testing operations were applied. In addition, C-STGA and STGA models were compared. As a result of the evaluation, the optimum result was achieved by initializing the C-STGA model with the ChildPlay-R dataset weights. Analysis revealed significant improvements in area Under the Curve values with the C-STGA model compared to the STGA model. An AUC value of 77% was obtained in the C-STGA model in detecting children's 2D gaze attention target. At the same time, the success rate in regional detecting is 82% for the robot region, 90% for the therapist region, and 76% for other areas. Combined performance measurements for detecting 2D gaze target points in different configurations are summarized in tables and illustrated through graphs, showing comparative analysis for each configuration.
Brain-inspired cortical-coding algorithm for multimedia processing

(Graduate School, 2024-07-03) Ünal, Ahmet Emin ; Üstündağ, Burak Berk ; 504211502 ; Computer Engineering

This thesis presents an innovative approach to multimedia data compression, drawing inspiration from the human brain's neocortex. The study addresses the need for advanced compression techniques in response to the growing volumes of multimedia data. This study begins with an extensive literature review that sets the context by examining the limitations of existing compression methods, particularly standard lossy codecs. It also explores the emerging potential of neural codecs, establishing a theoretical basis for the development of a new, brain-inspired compression algorithm. This algorithm aims to surpass current methods in compression efficiency, quality of decompression, and processing speed. In the methodology section, the thesis describes the design and implementation of the novel cortical-coding algorithm, which mimics the neocortex's method of processing information. The experimental framework is carefully detailed, including the theoretical underpinnings and specific algorithms employed to benchmark the codec's performance against both traditional and neural codecs. The results obtained are promising, showing that the cortical coding algorithm competes with and excels beyond selected traditional codecs (MP3, AAC, OPUS, OGG Vorbis) and neural codecs (EnCodec, SoundStream) in several key performance metrics. These findings are analyzed in depth, highlighting significant advancements in compression ratio, and output quality, while showing real-time processing capability. The discussion delves into the broader implications of these results, particularly their potential impact on real-time multimedia applications such as video conferencing, live streaming, and virtual reality. It is posited that the successful application of biomimetic principles with the proposed codec design can revolutionize multimedia data handling, providing more efficient and scalable solutions. The thesis is concluded by summarizing the research contributions, which include the successful demonstration of a novel, efficient, and effective approach to data compression, mainly audio and image compression, inspired by cortical coding principles. Recommendations for future research include further refinement of the codec and exploration into its application across different multimedia types to enhance versatility and utility. This thesis provides important new insights into multimedia compression and suggests new possibilities for applying neuroscience in developing digital technologies. It sets the stage for further interdisciplinary research that has the potential to impact the field of multimedia data processing significantly.
Character-level dilated deep neural networks for web attack detection

(Graduate School, 2024-09-16) Moarref, Nazanin ; Sandıkkaya, Mehmet Tahir ; 504172517 ; Computer Engineering

The swift expansion of web-based technology has resulted in a rise in intricate and advanced attacks directed toward website securities. An effective approach is necessary to defend against evolving attacks. This thesis's objective is to develop an effective method for detecting attacks. The goal is to detect attacks by utilizing the Hyper Text Transfer Protocol (HTTP) requests and minimizing the complexity of the preprocessing stage. For this reason, the HTTP requests are utilized at the character level. Therefore, the requests are interpreted as sequences of characters. Many studies have offered solutions to attack detection problems that leverage machine learning (ML) techniques. Feature engineering is required for many solutions in this field in order to achieve an efficient performance. Nevertheless, many of the applied techniques struggle to maintain the sequential information in the input. Deep learning (DL) garnered a lot of interest in attack detection since feature engineering is regarded to be the most labor-intensive step in developing an ML system. Since they are able to learn the feature representation and sequenced pattern within any given input automatically and generalize the feature representation efficiently. Hence, DL approaches outperform many traditional ML techniques. However, extracting long-term relationships remains a challenge for DL applications, despite their cutting-edge performance in attack detection. Larger receptive fields are necessary for convolutional neural networks (CNNs) to cover longer sequences. More layers are required for wider receptive fields, and more layers equal more parameters and a more difficult training process. Employing long short-term memory networks (LSTMs) is another efficient method for managing sequential data. Unfortunately, LSTMs still struggle to learn long-term relations because of their inability to deal with the problem of vanishing/exploding gradients. For capturing long-range dependencies in sequential data or time-series analysis, dilated neural networks are a good choice. By utilizing dilated layers and skip connections in LSTMs, the issue of vanishing/exploding gradients is mitigated. In CNNs, the receptive field can be expanded using dilated convolutions without requiring more computation or parameters. Gaps, or dilation, are created between the convolutional filter elements to accomplish the procedure. Dilated networks are therefore well suited for tasks that necessitate comprehending dependencies across a wide range since they can capture more contextual information. In this thesis, both the dilated LSTM and CNN-based methodologies' performances are assessed. Two distinct methodologies based on dilated LSTMs are evaluated: Dilated LSTM and dilated bidirectional LSTM (Bi-LSTM). The dilated Bi-LSTM methodology's first layer contains Bi-LSTM blocks. Consequently, the model in the Bi-LSTM layer retains the data available on both sides of each time step. With the aid of the dilated layers at the top of the Bi-LSTM layer, the model reduces the vanishing/exploding gradients problem and learns the temporal relations of different scales at various levels. With the exception of the LSTM blocks in the first layer, the structure of dilated LSTM is akin to dilated Bi-LSTM methodology. The multichannel of multilayer dilated CNN blocks with different kernel sizes make up the dilated CNN-based model as MC-MLDCNN. There are multiple layers in each channel, and their dilation sizes increase exponentially. By combining a variety of channels and multiple layers of dilated CNNs, the model recognizes the correlation and interdependence within character resolution in HTTP requests at various levels and scales. Three different datasets are used to assess the efficacy of CNN-based and dilated LSTM-based approaches to discover the long-term dependency of the complicated attacks. The Consejo Superior de Investigaciones Científicas (CSIC) 2010 dataset, the Web Application Firewall (WAF) dataset, and the self-collected dataset—which has been gathered for nearly a decade—are all used in the experiments. The WAF dataset only includes the query portion of HTTP requests, the self-collected data only includes the Uniform Resource Identifier (URI) portion and the full text of HTTP requests can be found in the CSIC 2010 dataset. The experiment's results demonstrate that, in terms of attack detection performance MC-MLDCNN performs better than dilated LSTM-based models in terms of accuracy, recall, precision and F1 score. MC-MLDCNN-based models require less computation time and converge faster as well. Therefore, the methodology proposed in this thesis to detect web attacks is MC-MLDCNN. The efficiency of the proposed methodology is compared with several cutting-edge DL-based methodologies found in the literature along with some conventional DL approaches. The experimental outcomes demonstrate the superiority of the proposed methodology using the same aforementioned metrics used for attack detection efficiency. Keeping the rate of categorizing normal requests as attacks (false positives) low while maintaining accurate attack detection is a critical skill for any effective web attack detection system. The business continuity is stopped as a result of the high false positive rate (FPR). To ensure enhanced security without compromising the availability and usability of web applications the FPR scores are also analyzed.
Codebook learning: Challenges and applications in image representation learning

(Graduate School, 2024-12-27) Can Baykal, Gülçin ; Ünal, Gözde ; 504202505 ; Computer Engineering

The rapid advancement of Machine Learning (ML) and Artificial Intelligence (AI) has paved the way for novel approaches in image representation learning for Computer Vision (CV), particularly through the utilization of codebook learning techniques. A codebook consists of representative vectors, also known as codewords, embeddings, or prototypes based on the context, that capture the essential features of the data. Codebook learning involves training these discrete representations within models, allowing the mapping of continuous data into a set of quantized or discrete vectors. This thesis studies codebook learning in two different contexts: the exploration of its challenges and the exploitation of the learned codebook in various tasks, including image generation and disentanglement. By examining three key studies, this thesis aims to provide a comprehensive understanding of how the challenges of codebook learning can be mitigated and how the learned codebook can be leveraged to enhance various image representation learning tasks. Codebook learning is beneficial in various applications, including image generation and classification tasks. It can be integrated into models like discrete Variational Autoencoders (VAEs), where it allows for efficient encoding and decoding of information, thereby improving performance in generative tasks. Additionally, in prototype based classification, codebooks consist of prototypes that characterize distinct classes within a dataset, enabling more accurate predictions. The versatility of codebook learning across different frameworks underscores its significance in advancing techniques for representation learning. The studies in this thesis perform codebook learning within different frameworks, and focus on the challenges of codebook learning along with the codebook incorporation to solve the significant problems of different image representation learning tasks. The first study addresses the challenge of codebook collapse where the codebook learning is performed within a discrete VAE framework. This phenomenon occurs when the learned codebook fails to capture the diversity of the input data as the multiple inputs get mapped to a limited number of codewords, leading to redundancy and a loss of representational power. This issue particularly arises in models such as Vector Quantized Variational Autoencoders (VQ-VAEs) and discrete VAEs, which rely on discrete representations for effective learning. The proposed solution involves a hierarchical Bayesian modeling to mitigate the codebook collapse. This work contributes significantly to the field by providing empirical evidence and theoretical insights into the root cause of codebook collapse, overcoming this collapse, thereby enhancing the representational power of discrete VAEs. After the first study that focuses on exploring the challenges of codebook learning within a VAE framework, the second and the third work focus on the problems of various image representation learning tasks where codebook learning can be exploited. In the second study, the focus shifts to the computational time problem of deep generative models, especially diffusion models. Diffusion models require relatively longer times for convergence, and our hypothesis is that incorporating informative signals about the data during the training of diffusion model might reduce the convergence time. However, the critical thing to manage is obtaining these informative signals in negligibly short time so that reducing the training time of the diffusion model also reduces the overall computational time. To learn such informative signals, we perform codebook learning within a framework of training a classifier, and the learned codebook consists of prototypes that represent the classes in the data. The second study in this thesis shows that using the class prototypes that are learned in a short time as the informative signals during the training of the diffusion model leads to better generative performance in the early stages of training, and eliminate the need for longer training. The third study's motivation is to overcome another important representation learning problem called disentanglement—a key aspect in understanding and representing complex data structures. Disentanglement refers to the ability to separate and manipulate the underlying factors of variation in the data, which is crucial for tasks such as attribute manipulation and controlled generation. On the grounds of the categorical nature of the underlying generative factors, our hypothesis is that using discrete representations that are well suited for the categorical data might aid disentanglement in the image representation. Therefore, we build a novel framework to learn a codebook within the framework of discrete VAEs, and propose an original optimization based regularization to further assist the disentanglement. The findings of this study demonstrate that using discrete representations and optimization based regularizers leads to significant improvements in terms of disentanglement. This research emphasizes the synergy between codebook learning and disentanglement, advocating for further exploration of their combined potential in advancing image representation learning. The exploration of these three studies reveals the critical challenges and advantages associated with codebook learning. The first study lays the groundwork by addressing the fundamental issue of codebook collapse, while the subsequent studies demonstrate the applicability of codebook learning in diverse contexts such as image generation and disentanglement. Together, these works illustrate that a robust understanding of codebook learning can lead to significant advancements in image generation and disentanglement. In summary, this thesis contributes to the growing literature on codebook learning by providing a detailed overview that includes its challenges and applications. The findings highlight the importance of addressing inherent challenges while leveraging the benefits of codebook learning for practical applications. Insights gained from this research aim not only to enhance the performance of existing models but also to inspire future innovations in image representation learning.

Gözat

Sustainable Development Goal "Goal 9: Industry, Innovation and Infrastructure" ile LEE- Bilgisayar Mühendisliği Lisansüstü Programı'a göz atma

Sayfa başına sonuç

Sıralama Seçenekleri