Comperative evaluation of unsupervised fraud detection algorithms with feature extraction and scaling in purchasing domain

Taşoğlu, Yiğit Can

Comperative evaluation of unsupervised fraud detection algorithms with feature extraction and scaling in purchasing domain

Dosyalar

528211079.pdf (1.19 MB)

Tarih

2024-08-21

item.page.authors

Taşoğlu, Yiğit Can

Yayınevi

Graduate School

Özet

The main aim of the research is to evaluate and compare various unsupervised outlier detection methods that do not require labeled data, making them suitable for real-world purchasing data where labels are often unavailable. The thesis highlights the challenges of fraud detection in large datasets, particularly in industries like finance and purchasing, where fraudulent activities can cause significant financial losses if not identified early. The motivation behind the research lies in the limitations of traditional, rule-based detection methods, which often fail to capture complex fraud patterns. Unsupervised algorithms, which can detect anomalies based on their deviation from the general behavior of the dataset, offer a proactive approach to fraud detection by identifying unseen fraud concepts. This study applies various methods, including distance-based, machine learning-based, and feature-based models, and focuses on enhancing these models through feature extraction and scaling techniques. The thesis evaluates several algorithms, such as Local Outlier Factor (LOF), DBSCAN, and Isolation Forest, using performance metrics like accuracy, precision, recall, and F1 score. LOF was identified as the most effective model, achieving the highest accuracy and demonstrating a robust ability to detect irregular patterns in the purchasing data. However, the effectiveness of all algorithms was significantly enhanced by data transformations, particularly scaling. Scaling ensures that features with differing magnitudes, such as quantities and prices, do not distort the results, allowing for more accurate anomaly detection. The importance of feature extraction is also emphasized, as it helps identify intricate patterns between data points. Extracted features, such as the frequency of purchase orders, vendor categories, and purchase amounts, provide deeper insights into potential fraud indicators. Additionally, the study recognizes that the integration of multiple models can reduce the limitations inherent in individual algorithms, thus creating a more comprehensive fraud detection framework. By combining different unsupervised methods and leveraging feature extraction, the research offers a more adaptive and reliable approach to identifying fraudulent activities. In conclusion, this study proves that employing a combination of unsupervised outlier detection methods, along with appropriate data preprocessing techniques, significantly improves fraud detection in purchasing systems. These methods not only enhance accuracy but also help businesses reduce financial risks and improve operational efficiency, ensuring a more secure and effective fraud prevention strategy.

Açıklama

Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2024

Konusu

data analysis, veri analizi, machine learning, makine öğrenmesi, big data, büyük veri

URI

http://hdl.handle.net/11527/25898

Koleksiyonlar

LEE- Büyük Veri ve İş Analitiği Yüksek Lisans

Detay Görünüm

Comperative evaluation of unsupervised fraud detection algorithms with feature extraction and scaling in purchasing domain

Dosyalar

Tarih

item.page.authors

Süreli Yayın başlığı

Süreli Yayın ISSN

Cilt Başlığı

Yayınevi

Özet

Açıklama

Konusu

Alıntı

URI

Koleksiyonlar

Endorsement

Review

Supplemented By

Referenced By