Profiling developers to predict vulnerable code changes

Coşkun, Tuğçe

Profiling developers to predict vulnerable code changes

dc.contributor.advisor	Kühn Tosun, Ayşe
dc.contributor.author	Coşkun, Tuğçe
dc.contributor.authorID	504201540
dc.contributor.department	Computer Engineering
dc.date.accessioned	2024-11-14T08:30:49Z
dc.date.available	2024-11-14T08:30:49Z
dc.date.issued	2023-07-20
dc.description	Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023
dc.description.abstract	Recently, researchers and practitioners have been interested in software vulnerability prediction and management. It is also possible to forecast software vulnerabilities using a variety of methods, most of which are based on features of code artifacts. Despite other research yielding encouraging outcomes, the significance of developers in introducing vulnerabilities has yet to be investigated. We characterize developers' vulnerability creation and vulnerability fixing habits in software development projects utilizing Heterogeneous Information Network (HIN) analysis. The SmartSHARK public dataset was used to estimate vulnerable commits. Four projects with the most actual vulnerability data were chosen, and fix commits associated with security vulnerabilities were identified. The dataset's commitments to projects were used to generate the HIN matrices. Different relationships linking the two object types have different transition likelihoods, resulting in distinct random walk processes and ranking results. Our analysis of the Random Walk and Restart (RWR) algorithm on the HIN revealed that some developers are more often linked to vulnerabilities than others. This thesis examines how they affect of developer profiles on predicting vulnerable commits and compares them to the code metrics-based approach. To extract all input features, we use the metrics we obtained by applying the RWR algorithm to the HIN and the aggregation of the code metrics. To create the prediction models, we use conventional machine learning methods, including Random Forest (RF), Naive Bayes (NB), and eXtreme Gradient Boosting (XGB), Support Vector Machine (SVM). We employed recall, accuracy, F-measure, and inspection rate to assess the efficacy of several classifying algorithms. In this thesis, Friedman and Nemenyi tests were used to establish that the differences across the models we created in terms of performance indicators were statistically significant. We report our empirical study that we conducted to anticipate vulnerability-inducing changes across four Apache projects. In terms of recall, the technique focused on code metrics is 90% effective, as opposed to the technique focused on developer behavior profiling is 71% effective. We achieve a success rate of 89% by combining the feature sets generated by the two strategies. The findings demonstrated that the coding practices of developers may be beneficial for predicting vulnerabilities.
dc.description.degree	M.Sc.
dc.identifier.uri	http://hdl.handle.net/11527/25619
dc.language.iso	en_US
dc.publisher	Graduate School
dc.sdg.type	Goal 9: Industry, Innovation and Infrastructure
dc.subject	Software vulnerability
dc.subject	Yazılım güvenliği
dc.subject	Security vulnerability
dc.subject	Güvenlik açığı
dc.title	Profiling developers to predict vulnerable code changes
dc.title.alternative	Güvenlik açığı kod değişikliklerini öngörmek için geliştiricilerin profilini oluşturma
dc.type	Master Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 504201540.pdf
Boyut:: 1.64 MB
Format:: Adobe Portable Document Format

İndir

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: license.txt
Boyut:: 1.58 KB
Format:: Item-specific license agreed upon to submission
Açıklama

İndir

Koleksiyonlar

LEE- Bilgisayar Mühendisliği-Yüksek Lisans