Generating synthetic data for user behavior based intrusion detection systems
Generating synthetic data for user behavior based intrusion detection systems
Dosyalar
Tarih
2024-07-16
Yazarlar
İbrahimov, Ughur
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
Intrusion detection systems are at a critical point in the effort to mitigate cyber vulnerabilities. While malicious actors are increasing day by day, the demand for multifunctional IDS models constantly increases. Since data plays the most crucial role in all cybersecurity measures, obtaining data is really important while developing these security precautions. At this point, synthetic data provides unique contributions to overcoming the problem of data scarcity. This thesis examines the intrusion detection concept, necessity of synthetic data in cybersecurity and synthetic data generation methods. The analyse provides information about relationship between synthetic data and intrusion detection systems, application process of synthetic data and privacy topics while generating and implementing artifical data for cybersecurity measures. After a detailed analyse, we decide generation method and tool for the purpose of this thesis. Since there are various methods and techniques to produce synthetic data for different purposes, we need to choose the right modeling and method for our work. Synthetic data producing methods include machine learning approaches like generative adversarial networks (GAN), variational autoenconders (VAE) furthermore, apporaches like simulation, interpolation and extrapolation, statistical modelling and more others. In this thesis, we generate synthetic data that shows daily behavior of the user who works as information technologies support technician and deals with tickets. We use Python language libraries are implemented for technical side to produce manufactured data. Moreover, scenario was developed to establish a synthetic dataset that is close to real life incidents as possible. Constants like ticket identifications, ticket types, action types are clearly defined in order to generate balanced synthetic data. One of the necessities of synthetic data usage in different industries is it being constructed in a balanced shape. Ticket types are defined as task, bug, support, question, feature, then we defined actions that contains work on ticket, reassign ticket, attach file to a ticket, and others. Although approximately 35,000 movements were created over a two-week period, the duration of the experiment could be extended over a longer period of time for a more realistic distribution in later developments. We also decided to make the synthetic data show actions between 9 A.M and 5 P.M which are work hours. The time spent is calculated from the difference between randomly assigned start and finish times between these hours. xxii Generated data is stored in Excel file, which contains approximately 35000 lines. It is possible to change the amount according to the purpose by making changes in the code. The statistical distribution of the result is shown in histograms at the end.
Açıklama
Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2024
Anahtar kelimeler
Linked data,
Bağlantılı veri,
Big data,
Büyük veri,
Intrusion detection system (IDS),
Saldırı tespit sistemi (IDS)