LEE- Büyük Veri ve İş Analitiği Lisansüstü Programı
Bu topluluk için Kalıcı Uri
Gözat
Yazar "Bakır Tartar, Feyza" ile LEE- Büyük Veri ve İş Analitiği Lisansüstü Programı'a göz atma
Sayfa başına sonuç
Sıralama Seçenekleri
-
ÖgeCustomer lifetime value prediction and segmentation analysis for commercial customers in the banking industry(Graduate School, 2024-08-12) Bakır Tartar, Feyza ; Tuna, Süha ; 528221082 ; Big Data and Business AnalyticsIn Türkiye, the banking sector plays a pivotal role in the growth of the national economy and maintenance of financial stability. Understanding and evaluating the behavior of corporate customers in the banking sector requires an accurate and comprehensive analysis. Customer Lifetime Value is a critical metric for understanding customer attitudes, therefore its accurate calculation is crucial. In this thesis, data on corporate customers from a company operating as a participation bank in Türkiye were used to create Customer Lifetime Value scores using various analytical techniques. In the first stage of the study, data from the past two years for each customer were collected from relevant databases and organized on a quarterly basis to calculate Customer Lifetime Value scores. The dataset was checked for missing values, inconsistencies, and errors. Outliers were identified using the z-score method during the data-cleaning process. In this study, data points with an absolute z-score greater than 3 were considered outliers. This threshold is a commonly used rule when assessing whether data conform to a normal distribution and helps minimize the impact of extreme values. These outliers are typically data points that could distort the overall structure of the dataset and negatively affect the model performance; thus, they were removed from the dataset. Before proceeding to the modeling phase, a standard scaling process was implemented to standardize the data. Scaling is a preprocessing step performed to eliminate problems arising from data with different units of measurement and to ensure that all data are on the same scale. Following the scaling process, the Customer Lifetime Value was predicted using five distinct machine learning algorithms. In this process, the outcomes of the Random Forest, Light Gradient-Boosting Machine, Extreme Gradient Boosting, Elastic-Net, and Linear Regression algorithms were assessed. To assess the model results, the values of Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, R2, and the Adjusted R2 were used. These metrics were employed to utilized how well the model predictions aligned with the target variable. Upon analyzing the results, the highest R2 score was found to be 0.55 with the Random Forest algorithm, with Light Gradient-Boosting Machine being the second-best algorithm. Following the evaluation of the results, parameter optimization was performed using the Grid Search Cross Validation model to enhance the model performance and achieve the best possible results. Grid Search Cross Validation is a technique that explores all possible combinations within a specified parameter set to identify hyperparameters that deliver the best performance. After parameter optimization, the highest R2 value was calculated to be 0.68 using the Random Forest algorithm, and the second-highest R2 value was 0.56 using the Extreme Gradient Boosting algorithm. In this study, the results of the Random Forest algorithm, which provided the highest prediction accuracy, were adopted as the basis for the clustering analysis. The K-means clustering algorithm was used to partition the data into meaningful clusters. Graphical analysis using the elbow method and a detailed examination of cluster properties led to the conclusion that 5 clusters best represented customer segments and were the optimal number of clusters. The aim is to increase satisfaction and loyalty among high value customers by offering special deals and by activating low value customers through incentives and promotions.