Generating synthetic data for user behavior based intrusion detection systems

dc.contributor.advisor Özdemir, Enver
dc.contributor.author İbrahimov, Ughur
dc.contributor.authorID 707211009
dc.contributor.department Cybersecurity Engineering and Cryptography
dc.date.accessioned 2025-05-23T07:59:29Z
dc.date.available 2025-05-23T07:59:29Z
dc.date.issued 2024-07-16
dc.description Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2024
dc.description.abstract Intrusion detection systems are at a critical point in the effort to mitigate cyber vulnerabilities. While malicious actors are increasing day by day, the demand for multifunctional IDS models constantly increases. Since data plays the most crucial role in all cybersecurity measures, obtaining data is really important while developing these security precautions. At this point, synthetic data provides unique contributions to overcoming the problem of data scarcity. This thesis examines the intrusion detection concept, necessity of synthetic data in cybersecurity and synthetic data generation methods. The analyse provides information about relationship between synthetic data and intrusion detection systems, application process of synthetic data and privacy topics while generating and implementing artifical data for cybersecurity measures. After a detailed analyse, we decide generation method and tool for the purpose of this thesis. Since there are various methods and techniques to produce synthetic data for different purposes, we need to choose the right modeling and method for our work. Synthetic data producing methods include machine learning approaches like generative adversarial networks (GAN), variational autoenconders (VAE) furthermore, apporaches like simulation, interpolation and extrapolation, statistical modelling and more others. In this thesis, we generate synthetic data that shows daily behavior of the user who works as information technologies support technician and deals with tickets. We use Python language libraries are implemented for technical side to produce manufactured data. Moreover, scenario was developed to establish a synthetic dataset that is close to real life incidents as possible. Constants like ticket identifications, ticket types, action types are clearly defined in order to generate balanced synthetic data. One of the necessities of synthetic data usage in different industries is it being constructed in a balanced shape. Ticket types are defined as task, bug, support, question, feature, then we defined actions that contains work on ticket, reassign ticket, attach file to a ticket, and others. Although approximately 35,000 movements were created over a two-week period, the duration of the experiment could be extended over a longer period of time for a more realistic distribution in later developments. We also decided to make the synthetic data show actions between 9 A.M and 5 P.M which are work hours. The time spent is calculated from the difference between randomly assigned start and finish times between these hours. xxii Generated data is stored in Excel file, which contains approximately 35000 lines. It is possible to change the amount according to the purpose by making changes in the code. The statistical distribution of the result is shown in histograms at the end.
dc.description.degree M.Sc.
dc.identifier.uri http://hdl.handle.net/11527/27153
dc.language.iso en_US
dc.publisher Graduate School
dc.sdg.type Goal 7: Affordable and Clean Energy
dc.sdg.type Goal 9: Industry, Innovation and Infrastructure
dc.sdg.type Goal 11: Sustainable Cities and Communities
dc.subject Linked data
dc.subject Bağlantılı veri
dc.subject Big data
dc.subject Büyük veri
dc.subject Intrusion detection system (IDS)
dc.subject Saldırı tespit sistemi (IDS)
dc.title Generating synthetic data for user behavior based intrusion detection systems
dc.title.alternative Kullanıcı davranışına dayalı saldırı tespit sistemleri için sentetik veri oluşturulması
dc.type Master Thesis
Dosyalar
Orijinal seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.alt
Ad:
707211009.pdf
Boyut:
2.95 MB
Format:
Adobe Portable Document Format
Açıklama
Lisanslı seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.placeholder
Ad:
license.txt
Boyut:
1.58 KB
Format:
Item-specific license agreed upon to submission
Açıklama