Profiling developers to predict vulnerable code changes

dc.contributor.advisorKühn Tosun, Ayşe
dc.contributor.authorCoşkun, Tuğçe
dc.contributor.authorID504201540
dc.contributor.departmentComputer Engineering
dc.date.accessioned2024-11-14T08:30:49Z
dc.date.available2024-11-14T08:30:49Z
dc.date.issued2023-07-20
dc.descriptionThesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023
dc.description.abstractRecently, researchers and practitioners have been interested in software vulnerability prediction and management. It is also possible to forecast software vulnerabilities using a variety of methods, most of which are based on features of code artifacts. Despite other research yielding encouraging outcomes, the significance of developers in introducing vulnerabilities has yet to be investigated. We characterize developers' vulnerability creation and vulnerability fixing habits in software development projects utilizing Heterogeneous Information Network (HIN) analysis. The SmartSHARK public dataset was used to estimate vulnerable commits. Four projects with the most actual vulnerability data were chosen, and fix commits associated with security vulnerabilities were identified. The dataset's commitments to projects were used to generate the HIN matrices. Different relationships linking the two object types have different transition likelihoods, resulting in distinct random walk processes and ranking results. Our analysis of the Random Walk and Restart (RWR) algorithm on the HIN revealed that some developers are more often linked to vulnerabilities than others. This thesis examines how they affect of developer profiles on predicting vulnerable commits and compares them to the code metrics-based approach. To extract all input features, we use the metrics we obtained by applying the RWR algorithm to the HIN and the aggregation of the code metrics. To create the prediction models, we use conventional machine learning methods, including Random Forest (RF), Naive Bayes (NB), and eXtreme Gradient Boosting (XGB), Support Vector Machine (SVM). We employed recall, accuracy, F-measure, and inspection rate to assess the efficacy of several classifying algorithms. In this thesis, Friedman and Nemenyi tests were used to establish that the differences across the models we created in terms of performance indicators were statistically significant. We report our empirical study that we conducted to anticipate vulnerability-inducing changes across four Apache projects. In terms of recall, the technique focused on code metrics is 90% effective, as opposed to the technique focused on developer behavior profiling is 71% effective. We achieve a success rate of 89% by combining the feature sets generated by the two strategies. The findings demonstrated that the coding practices of developers may be beneficial for predicting vulnerabilities.
dc.description.degreeM.Sc.
dc.identifier.urihttp://hdl.handle.net/11527/25619
dc.language.isoen_US
dc.publisherGraduate School
dc.sdg.typeGoal 9: Industry, Innovation and Infrastructure
dc.subjectSoftware vulnerability
dc.subjectYazılım güvenliği
dc.subjectSecurity vulnerability
dc.subjectGüvenlik açığı
dc.titleProfiling developers to predict vulnerable code changes
dc.title.alternativeGüvenlik açığı kod değişikliklerini öngörmek için geliştiricilerin profilini oluşturma
dc.typeMaster Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
Ad:
504201540.pdf
Boyut:
1.64 MB
Format:
Adobe Portable Document Format

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
Ad:
license.txt
Boyut:
1.58 KB
Format:
Item-specific license agreed upon to submission
Açıklama