Comperative evaluation of unsupervised fraud detection algorithms with feature extraction and scaling in purchasing domain

dc.contributor.advisor Ergün, Mehmet Ali
dc.contributor.author Taşoğlu, Yiğit Can
dc.contributor.authorID 528211079
dc.contributor.department Big Data and Business Analytics
dc.date.accessioned 2024-12-20T08:45:19Z
dc.date.available 2024-12-20T08:45:19Z
dc.date.issued 2024-08-21
dc.description Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2024
dc.description.abstract The main aim of the research is to evaluate and compare various unsupervised outlier detection methods that do not require labeled data, making them suitable for real-world purchasing data where labels are often unavailable. The thesis highlights the challenges of fraud detection in large datasets, particularly in industries like finance and purchasing, where fraudulent activities can cause significant financial losses if not identified early. The motivation behind the research lies in the limitations of traditional, rule-based detection methods, which often fail to capture complex fraud patterns. Unsupervised algorithms, which can detect anomalies based on their deviation from the general behavior of the dataset, offer a proactive approach to fraud detection by identifying unseen fraud concepts. This study applies various methods, including distance-based, machine learning-based, and feature-based models, and focuses on enhancing these models through feature extraction and scaling techniques. The thesis evaluates several algorithms, such as Local Outlier Factor (LOF), DBSCAN, and Isolation Forest, using performance metrics like accuracy, precision, recall, and F1 score. LOF was identified as the most effective model, achieving the highest accuracy and demonstrating a robust ability to detect irregular patterns in the purchasing data. However, the effectiveness of all algorithms was significantly enhanced by data transformations, particularly scaling. Scaling ensures that features with differing magnitudes, such as quantities and prices, do not distort the results, allowing for more accurate anomaly detection. The importance of feature extraction is also emphasized, as it helps identify intricate patterns between data points. Extracted features, such as the frequency of purchase orders, vendor categories, and purchase amounts, provide deeper insights into potential fraud indicators. Additionally, the study recognizes that the integration of multiple models can reduce the limitations inherent in individual algorithms, thus creating a more comprehensive fraud detection framework. By combining different unsupervised methods and leveraging feature extraction, the research offers a more adaptive and reliable approach to identifying fraudulent activities. In conclusion, this study proves that employing a combination of unsupervised outlier detection methods, along with appropriate data preprocessing techniques, significantly improves fraud detection in purchasing systems. These methods not only enhance accuracy but also help businesses reduce financial risks and improve operational efficiency, ensuring a more secure and effective fraud prevention strategy.
dc.description.degree M.Sc.
dc.identifier.uri http://hdl.handle.net/11527/25898
dc.language.iso en_US
dc.publisher Graduate School
dc.sdg.type Goal 9: Industry, Innovation and Infrastructure
dc.subject data analysis
dc.subject veri analizi
dc.subject machine learning
dc.subject makine öğrenmesi
dc.subject big data
dc.subject büyük veri
dc.title Comperative evaluation of unsupervised fraud detection algorithms with feature extraction and scaling in purchasing domain
dc.title.alternative Satın alma alanında özellik çıkarma ve ölçekleme ile denetimsiz sahtekarlık tespit algoritmalarının karşılaştırmalı değerlendirmesi
dc.type Master Thesis
Dosyalar
Orijinal seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.alt
Ad:
528211079.pdf
Boyut:
1.19 MB
Format:
Adobe Portable Document Format
Açıklama
Lisanslı seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.placeholder
Ad:
license.txt
Boyut:
1.58 KB
Format:
Item-specific license agreed upon to submission
Açıklama