LEE- İşletme Mühendisliği Lisansüstü Programı
Bu topluluk için Kalıcı Uri
Gözat
Sustainable Development Goal "none" ile LEE- İşletme Mühendisliği Lisansüstü Programı'a göz atma
Sayfa başına sonuç
Sıralama Seçenekleri
-
ÖgeFuzzy clustering based ensemble learning approach: Applications in digital advertising(Lisansüstü Eğitim Enstitüsü, 2021) Tekin, Ahmet Tezcan ; Kaya, Tolga ; Çebi, Ferhan ; 711174 ; İşletme MühendisliğiAlthough the history of machine learning algorithms is quite old, it has been popularly used in the last ten years. The main reason for this situation is that it has become possible to run these algorithms even on our personal computers with the developing computer hardware technology. In addition, the size of the data generated in the internet environment is increasing exponentially with each passing day, with digitalization and internet usage becoming more widespread. Therefore, the need for technologies such as big data and machine learning is increasing day by day. In line with the increasing demands, machine learning has become an indispensable need in academia and the private sector. Thanks to machine learning, companies make predictions about their future processes, thus aiming to eliminate future uncertain situations and create more effective process management. E.g., A company seeks to use its marketing budget more effectively by using machine learning technologies for its marketing processes and thus maximizing its profitability rate. In recent years, there have been many studies in the literature on the development of machine learning algorithms and the elimination of the weaknesses of traditional machine learning methods. Regardless of the type of problem in the prediction process, the aim is to predict with a minimum error rate. In this context, many methods have been tried. The ensemble learning approach is one of the most successful methods in the literature, proving its success for this purpose. The purpose of ensemble learning is to combine multiple algorithms to close each other's weaknesses and increase the success rate in prediction. Observations on the dataset to be estimated may be characteristically similar or very different from each other. In this case, in many studies in the literature, the clustering process is performed before applying machine learning algorithms, and then the modelling stage is started. In such approaches, hard clustering approaches are used. Hard clustering approaches assign each observation value to only one cluster due to their working principles. Therefore, the sizes of the subsets to be modelled in some cases do not reach the size of the training set required for higher prediction success to occur. Considering that an observation value contains the characteristics of more than one cluster simultaneously, it is seen that the soft clustering approach is used to eliminate this problem. Although there are many studies in the literature on the fuzzy clustering method, which is a part of the soft clustering approach, there are not many examples in the literature regarding the use of the machine learning approach as an intermediate method in terms of improving its results. In this thesis, after the fuzzy clustering approach applied to the observation set with three published essays, it is aimed to ensemble the most successful models of each cluster, taking into account the error rates and thus improving the model performances. To test the validity of this approach, different studies were carried out for both regression and classification problems with datasets obtained from different sectors. In the first study, click and sales predictions were realised using digital advertisement performance data and reservation data in metasearch engines of an online travel agency operating in Turkey. This prediction is crucial for the company's short, medium and long-term financial goals. In this study, the traditional regression method and the proposed fuzzy clustering approach were used together and the results were compared with the results of the traditional methods. Machine learning algorithms were applied directly to the dataset, which had been applied data preprocessing and feature engineering within the framework of traditional methods. Then the modelling study was carried out again after the hard clustering and soft clustering approaches were applied to the dataset. As a result, although the processing load increased due to the inclusion of the clustering approach in addition to the modelling stage, more effective results were obtained than applying machine learning algorithms directly to the dataset. At the same time, the results obtained after the hard clustering approach and fuzzy clustering approaches were compared. It was observed that the success rate of the predictions made after the fuzzy clustering approach was higher. In the second study, the approach proposed in the first article is tested for solving a different problem with different sector data. In this study, it has been tried to predict the lifetime value of the customers by using the game data and session information of the users of a mobile crossword puzzle game published in more than thirty languages and more than thirty countries. Ensemble learning algorithms, which were not used in the first article, were examined in more depth and focused on algorithms that could achieve higher prediction success rates when used together with fuzzy clustering. Different hyperparameter combinations of Catboost, Extreme Gradient Boosting and Light Gradient Boosting algorithms, which are seen in the literature to be generally more successful than traditional machine learning algorithms, were tested separately for each cluster after the clustering phase performed with the Fuzzy C-Means clustering algorithm. The prediction values of the three most successful of these combinations were weighted to be inversely proportional to the error rates, and the error rates of the resulting predictions were compared with the results of other model-parameter combinations. It has been determined that the model established with the proposed method has a lower error rate than other models, thus making a more efficient prediction. In the third study, customer retention rate prediction was carried out with a different dataset collected in the gaming industry. Unlike the first two studies, in this study, a classification problem was tried to be solved with the proposed method, at the same time, different cluster initial parameters and different fuzziness parameters were tested. The aim is to obtain a more optimal clustering in the Fuzzy C-Means clustering approach, and the clustering process was the most successful combination. Since the nature of the problem is a classification problem, the prediction was carried out by weighting the accuracy results instead of the error rates of the algorithms at the stage of combining the results of the algorithm-parameter combinations. As a result of this study, it has been observed that the results of the method applied on different clusters clustered with the fuzzy clustering approach produce more effective results than applying machine learning algorithms directly to the dataset. As a result, this thesis provides the opportunity to make more successful predictions in datasets with different characteristics by strengthening the concept of ensemble learning, which has an important place in developing machine learning approaches with fuzzy clustering approaches. In addition, it allows identifying observation sets that contain the characteristics of more than one cluster simultaneously and to model in separate clusters during the modelling phase to create more effective prediction results. In this constantly developing field, new studies can progress from many branches. First of all, in the fuzzy clustering stage, instead of the Fuzzy C-Means clustering method, other alternative fuzzy clustering approaches in the literature can be tried again during the modelling stage. And a different fuzzy clustering algorithm can be preferred according to the efficiency result. At the same time, it may be possible to change the weight coefficients with different methods or shapes at the stage of combining the results of the most successful models. Beyond all this, this method will enable to produce more effective results by using it together with new machine learning algorithms that will be introduced to the literature in the future.