Profiling developers to predict vulnerable code changes

Yükleniyor...
Küçük Resim

item.page.authors

Süreli Yayın başlığı

Süreli Yayın ISSN

Cilt Başlığı

Yayınevi

Graduate School

Özet

Recently, researchers and practitioners have been interested in software vulnerability prediction and management. It is also possible to forecast software vulnerabilities using a variety of methods, most of which are based on features of code artifacts. Despite other research yielding encouraging outcomes, the significance of developers in introducing vulnerabilities has yet to be investigated. We characterize developers' vulnerability creation and vulnerability fixing habits in software development projects utilizing Heterogeneous Information Network (HIN) analysis. The SmartSHARK public dataset was used to estimate vulnerable commits. Four projects with the most actual vulnerability data were chosen, and fix commits associated with security vulnerabilities were identified. The dataset's commitments to projects were used to generate the HIN matrices. Different relationships linking the two object types have different transition likelihoods, resulting in distinct random walk processes and ranking results. Our analysis of the Random Walk and Restart (RWR) algorithm on the HIN revealed that some developers are more often linked to vulnerabilities than others. This thesis examines how they affect of developer profiles on predicting vulnerable commits and compares them to the code metrics-based approach. To extract all input features, we use the metrics we obtained by applying the RWR algorithm to the HIN and the aggregation of the code metrics. To create the prediction models, we use conventional machine learning methods, including Random Forest (RF), Naive Bayes (NB), and eXtreme Gradient Boosting (XGB), Support Vector Machine (SVM). We employed recall, accuracy, F-measure, and inspection rate to assess the efficacy of several classifying algorithms. In this thesis, Friedman and Nemenyi tests were used to establish that the differences across the models we created in terms of performance indicators were statistically significant. We report our empirical study that we conducted to anticipate vulnerability-inducing changes across four Apache projects. In terms of recall, the technique focused on code metrics is 90% effective, as opposed to the technique focused on developer behavior profiling is 71% effective. We achieve a success rate of 89% by combining the feature sets generated by the two strategies. The findings demonstrated that the coding practices of developers may be beneficial for predicting vulnerabilities.

Açıklama

Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023

Konusu

Software vulnerability, Yazılım güvenliği, Security vulnerability, Güvenlik açığı

Alıntı

Endorsement

Review

Supplemented By

Referenced By