Günlük ve saatlik yağış verilerinin yeni ters mesafe ağırlıklı model ile tahmin edilmesi
Günlük ve saatlik yağış verilerinin yeni ters mesafe ağırlıklı model ile tahmin edilmesi
Dosyalar
Tarih
2025
Yazarlar
Başkesen, Kevser Merkür
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
İTÜ Lisansüstü Eğitim Enstitüsü
Özet
Meteorolojik verilerin eksiksiz ve güvenilir şekilde elde edilmesi, özellikle iklim değişikliği, taşkın ve kuraklık gibi doğal afetlerin yönetimi açısından büyük önem taşımaktadır. Mevcut meteoroloji istasyonu sayısının yetersizliğine ek olarak, cihaz arızaları, veri iletim problemleri ve çevresel etkiler gibi nedenlerle gözlem istasyonlarından elde edilen yağış verilerinde eksiklikler oluşabilmektedir. Bu çalışmada, eksik kalan yağış verilerinin tamamlanması için farklı modelleme yöntemleri karşılaştırmalı olarak değerlendirilmiştir ve yeni bir yaklaşım önerilmiştir. İstanbul'daki Sarıyer/İTÜ Maslak meteoroloji istasyonu hedef istasyon olarak seçilirken 2014–2017 yılları arasındaki saatlik ve günlük yağış verilerindeki eksiklikler ele alınmıştır. Tahminlerde kullanılmak üzere istasyona yakın beş meteoroloji istasyonu olan Sarıyer, Eyüp, Beykoz, Şişli ve Üsküdar istasyonları referans alınmıştır. Çalışmada hem geleneksel hem de modern yöntemler birlikte incelenmiştir. Böylece farklı mekânsal ve zamansal yapıların tahmin başarısına etkisi analiz edilmiştir. Uygulanan yöntemler arasında; mesafeye dayalı klasik modeller olan Ters Mesafe Ağırlıklı Yöntem (IDW), Modifiye IDW Modelleri (MIDW1 ve MIDW2), Ters Üstel Ağırlıklı Yöntem (IEWM), Karekök Mesafeye Dayalı Modifiye Normal Oran Yöntemi (MNR-T) ve yapay zekâ tabanlı Uyarlanabilir Sinirsel Bulanık Çıkarım Sistemi (ANFIS) yer almaktadır. Ayrıca bu çalışmada Yeni Ters Mesafe Modeli (Yeni IDW) önerilmiştir ve performansı diğer yöntemlerle karşılaştırılmıştır. Bu yeni model, hedef istasyon ile komşu istasyonlar arasındaki mesafeyi dikkate alarak, mekânsal ağırlıkları daha etkili bir şekilde hesaplamayı hedeflemektedir. Tüm yöntemler, farklı eğitim/test veri oranları (60/40, 70/30, 80/20) ve farklı sayıda istasyon girdisi (2, 3, 4, 5) kullanılarak test edilmiştir ve modellerin hata oranları karşılaştırılmıştır. Saatlik ve günlük verilerin 7 farklı yöntem ve 3 farklı eğitim/test oranları ile değerlendirilmesi sonucunda en düşük hatayı veren 3 yöntem olan; Yeni IDW, IDW ve MNR-T yöntemleri kullanılarak eksik veriler tamamlanmıştır. Bu tez çalışmasıyla, eksik yağış verilerinin tamamlanmasına yönelik farklı yöntemler değerlendirilmiştir ve yeni geliştirilen modelin alternatif bir çözüm olarak uygulanabilirliği ortaya konmuştur. Elde edilen sonuçlar ile eksik veriye sahip meteoroloji istasyonları için daha doğru ve güvenilir tahminlerin yapılabileceği dolayısıyla veri sürekliliğinin sağlanabileceği gösterilmiştir.
The reliability and continuity of meteorological data are essential for hydrological studies, urban planning, agricultural management, and climate-related risk assessments. In many areas, it is difficult to obtain complete and reliable weather data due to the small number of meteorological stations. Additionally, precipitation data collected from existing stations often have missing values due to equipment failures, data transmission errors, or environmental disruptions. It is important to estimate these missing values correctly to keep the data reliable and help make better decisions in areas affected by climate. In this context, this thesis focuses on estimating missing hourly and daily precipitation data for the Sariyer/ITU Maslak meteorological station in Istanbul, using various spatial interpolation and a machine learning method. The primary objective of this thesis is to compare the performance of several methods with a newly developed distance-based model, and to investigate the applicability and effectiveness of these approaches in different scenarios. This thesis used precipitation data from the Sariyer/ITU Maslak station and five surrounding stations: Sariyer, Eyup, Beykoz, Sisli and Uskudar, which were selected based on their distance. The precipitation data was obtained from the Turkish State Meteorological Service (MGM). The dataset includes hourly and daily precipitation observations recorded between 2014 and 2024 for the five surrounding stations, and between 2017 and 2024 for the Sariyer/ITU Maslak station. Missing data for the ITU station from 2014 to 2017 was also completed using the selected methods. For hourly data prediction at the Sariyer station, the data estimation began on March 24, 2014, at 3:00 PM, the same time when data was available for the other four stations. Missing data for the Sariyer/ITU Maslak station was completed up to December 18, 2017, at 6:00 AM. For daily data prediction at the Sariyer station, the estimation started on March 24, 2014 and missing daily data for the Sariyer/ITU Maslak station was filled until December 18, 2017. To evaluate the accuracy of the models, different training/testing data ratios (60/40, 70/30, 80/20) were used in the analysis. Additionally, the number of input stations, specifically 2, 3, 4, and 5 stations, was also tested. This study uses 6 spatial interpolation methods and 1 modern machine learning techniques to estimate precipitation. One of the methods used in the thesis, Inverse Distance Weighted (IDW) is a commonly used method where closer stations have higher effect on the estimate. Another method, Modified Inverse Distance Weighted (MIDW) is an enhanced version of IDW that adjusts weights based on distance and elevation. There are two MIDW formulas, one developed by Lo (MIDW1) and the other developed by Chang (MIDW2). Inverse Exponential Weighted Method (IEWM), uses weights that decrease exponentially as the distance increases. Another method is Modified Normal Ratio with Square Root Distance (MNR-T) which balances precipitation ratios from neighboring stations using square-root distance weighting. There is also a new spatial model, the New Inverse Distance Weighted Model (New IDW), developed in this thesis to improve estimation accuracy by considering distance differences between stations, and giving different weights to stations. The New IDW uses a dynamic formula to calculate weights based on the total distances between stations. The other method, which is a machine learning technique is Adaptive Neuro-Fuzzy Inference System (ANFIS), a hybrid model that combines neural networks and fuzzy logic to model nonlinear relationships and station interactions. All models were applied using Excel, R Studio and MATLAB programs. Each model was evaluated using the standard statistical metric, Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). MAPE values are written as %. When examining the error values for hourly data, it was observed that the New IDW, IDW and MNR-T methods had similar error values. These models generally provided more reliable and accurate predictions compared to other models, with lower error values. Also, it was observed that while MAE and MAPE values were expected to show a similar trend, in some cases, the MAE value of any model was low while the MAPE value was higher than other models. The main reason for this inconsistency is that when 0 (zero) mm precipitation is observed in Sarıyer/İTÜ Maslak station in the same time period as the input stations, these precipitation data points are removed from the analysis because they do not contain any information. For this reason, the precipitation average of Sarıyer/İTÜ Maslak station differs in each different input station case. With a 60/40 training/test ratio, as the number of input stations increases, the MAPE value changes as: 41.5, 42.0, 40.6 and 42.2, respectively. For a 70/30 training/test ratio, the MAPE value changes as: 41.2, 41.6, 40.9 and 41.8, respectively. In the case of an 80/20 training/test ratio, the MAPE value changes as: 40.4, 41.3, 39.7 and 40.6, respectively. These results indicate that for hourly data, the error value is the lowest when there are 4 inputs for hourly data. In addition, the lowest error values are generally seen in the case of 80/20 training/test rate. In addition, it is seen that the MAE and MAPE values of the IDW and New IDW methods are the same when there are only 2 input stations. Based on this, it can be said that these methods turn into the same model when there are 2 input stations and therefore the New IDW method should be used at least at 3 or more input stations to make a significant difference. When examining the error values for daily data, model performance was evaluated using different input combinations and data training/test ratios. Similar to the hourly data, the New IDW, IDW and MNR-T methods had very close error values, with all providing more reliable and accurate predictions compared to other methods. In daily data, as in hourly data, MAE and MAPE values show inconsistency due to the change in the precipitation average of Sarıyer/ITU Maslak station in every input station change. In addition, it is seen that the MAE and MAPE values of the IDW and New IDW methods are the same when there are only 2 input stations in the daily data. For this reason, it can be said that the New IDW method will not be used when there are 2 input stations. For the daily dataset; with a 60/40 training/test ratio, as the number of input stations increased, the MAPE value changes as: 25.1, 24.8, 23.3 and 25.5, respectively. The lowest error of 23.3 was observed with 4 input stations. Similarly, for a 70/30 training/test ratio, the MAPE value changes as: 23.5, 24.2, 23.9 and 24.4, respectively. The lowest error of 23.5 was observed with 2 input stations. For the 80/20 training/test ratio, the MAPE value changes as: 23.1, 24.4, 24.0 and 23.6, respectively. The lowest error of 23.1 was observed with 2 input stations. After evaluating hourly and daily data using 7 different methods, 3 training/test ratios and different input stations, the 3 methods with the lowest errors were selected: New IDW, IDW and MNR-T. The best-performing methods for hourly precipitation data in this scenario were New IDW with a value of 39.8, IDW with a value of 39.7 and MNR-T with a value of 40.4. These methods all used Sariyer, Beykoz, Sisli, and Uskudar as input stations and 80/20 training/test rate. For daily precipitation data, the best-performing methods were New IDW with a value of 23.7, using Sariyer, Eyup, Sisli, Beykoz and Uskudar and 80/20 training/test ratio. IDW with a value of 23.3 and, MNR-T with a value of 23.4 performed well when both using Sariyer, Beykoz, Sisli and Uskudar as input stations and 60/40 training/test ratio. This methods showed competitive performance across both hourly and daily datasets, especially in scenarios where station distances played a significant role. These methods were used to fill in the missing data for the Sariyer/ITU Maslak station between 2014 and 2017. During the missing precipitation data imputation process, if any station among the five input stations had missing data, the remaining four stations were used to fill in the missing values. Similarly, when the lowest error was obtained with four input stations (Sariyer, Beykoz, Sisli, Uskudar), predictions were made using data from these four stations, if the data was available. In cases where any of these four stations had missing data, and if data from the Sariyer station was available, Sariyer was included in the imputation process to fill the missing values. This method was applied up to the scenario with two input stations. This approach ensured that the missing data for each target station was predicted using information from the closest neighboring stations. As a result, for the hourly dataset; the New IDW, IDW, and MNR-T methods gave similar and low error values. These three methods also performed well for the daily dataset, showing better accuracy than other methods. These methods are also easier to use and require simpler calculations. Although the ANFIS method can model complex relationships, it gave higher errors in this study. This method may improve if more variables (like temperature or wind speed) and larger datasets are used. Additionally, MIDW1, MIDW2, and IEWM showed higher errors in some cases, especially when fewer stations were used. The results of this study emphasize the importance of filling missing precipitation data with reliable methods to improve the accuracy of meteorological data. To achieve accurate data imputation, factors such as the station's geographic location, the distance between surrounding stations, elevation differences, regional precipitation characteristics, and the performance metrics of the models must be carefully considered. In particular, the choice of methods, input stations and training/test ratios played a key role in the success of the imputation process. Using multiple input stations allowed the methods to capture precipitation patterns more effectively, which enhanced the accuracy of both hourly and daily predictions. Furthermore, using different training/test ratios helped evaluate the methods in various situations, ensuring their reliability. This is especially important in fields like water management, climate planning, agriculture, and weather prediction, where accurate data is essential for making informed decisions and preparing for climate challenges. Future studies could explore the application of these methods in various geographic regions and climate conditions to assess their effectiveness and reliability in different environmental contexts and precipitation patterns.
The reliability and continuity of meteorological data are essential for hydrological studies, urban planning, agricultural management, and climate-related risk assessments. In many areas, it is difficult to obtain complete and reliable weather data due to the small number of meteorological stations. Additionally, precipitation data collected from existing stations often have missing values due to equipment failures, data transmission errors, or environmental disruptions. It is important to estimate these missing values correctly to keep the data reliable and help make better decisions in areas affected by climate. In this context, this thesis focuses on estimating missing hourly and daily precipitation data for the Sariyer/ITU Maslak meteorological station in Istanbul, using various spatial interpolation and a machine learning method. The primary objective of this thesis is to compare the performance of several methods with a newly developed distance-based model, and to investigate the applicability and effectiveness of these approaches in different scenarios. This thesis used precipitation data from the Sariyer/ITU Maslak station and five surrounding stations: Sariyer, Eyup, Beykoz, Sisli and Uskudar, which were selected based on their distance. The precipitation data was obtained from the Turkish State Meteorological Service (MGM). The dataset includes hourly and daily precipitation observations recorded between 2014 and 2024 for the five surrounding stations, and between 2017 and 2024 for the Sariyer/ITU Maslak station. Missing data for the ITU station from 2014 to 2017 was also completed using the selected methods. For hourly data prediction at the Sariyer station, the data estimation began on March 24, 2014, at 3:00 PM, the same time when data was available for the other four stations. Missing data for the Sariyer/ITU Maslak station was completed up to December 18, 2017, at 6:00 AM. For daily data prediction at the Sariyer station, the estimation started on March 24, 2014 and missing daily data for the Sariyer/ITU Maslak station was filled until December 18, 2017. To evaluate the accuracy of the models, different training/testing data ratios (60/40, 70/30, 80/20) were used in the analysis. Additionally, the number of input stations, specifically 2, 3, 4, and 5 stations, was also tested. This study uses 6 spatial interpolation methods and 1 modern machine learning techniques to estimate precipitation. One of the methods used in the thesis, Inverse Distance Weighted (IDW) is a commonly used method where closer stations have higher effect on the estimate. Another method, Modified Inverse Distance Weighted (MIDW) is an enhanced version of IDW that adjusts weights based on distance and elevation. There are two MIDW formulas, one developed by Lo (MIDW1) and the other developed by Chang (MIDW2). Inverse Exponential Weighted Method (IEWM), uses weights that decrease exponentially as the distance increases. Another method is Modified Normal Ratio with Square Root Distance (MNR-T) which balances precipitation ratios from neighboring stations using square-root distance weighting. There is also a new spatial model, the New Inverse Distance Weighted Model (New IDW), developed in this thesis to improve estimation accuracy by considering distance differences between stations, and giving different weights to stations. The New IDW uses a dynamic formula to calculate weights based on the total distances between stations. The other method, which is a machine learning technique is Adaptive Neuro-Fuzzy Inference System (ANFIS), a hybrid model that combines neural networks and fuzzy logic to model nonlinear relationships and station interactions. All models were applied using Excel, R Studio and MATLAB programs. Each model was evaluated using the standard statistical metric, Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). MAPE values are written as %. When examining the error values for hourly data, it was observed that the New IDW, IDW and MNR-T methods had similar error values. These models generally provided more reliable and accurate predictions compared to other models, with lower error values. Also, it was observed that while MAE and MAPE values were expected to show a similar trend, in some cases, the MAE value of any model was low while the MAPE value was higher than other models. The main reason for this inconsistency is that when 0 (zero) mm precipitation is observed in Sarıyer/İTÜ Maslak station in the same time period as the input stations, these precipitation data points are removed from the analysis because they do not contain any information. For this reason, the precipitation average of Sarıyer/İTÜ Maslak station differs in each different input station case. With a 60/40 training/test ratio, as the number of input stations increases, the MAPE value changes as: 41.5, 42.0, 40.6 and 42.2, respectively. For a 70/30 training/test ratio, the MAPE value changes as: 41.2, 41.6, 40.9 and 41.8, respectively. In the case of an 80/20 training/test ratio, the MAPE value changes as: 40.4, 41.3, 39.7 and 40.6, respectively. These results indicate that for hourly data, the error value is the lowest when there are 4 inputs for hourly data. In addition, the lowest error values are generally seen in the case of 80/20 training/test rate. In addition, it is seen that the MAE and MAPE values of the IDW and New IDW methods are the same when there are only 2 input stations. Based on this, it can be said that these methods turn into the same model when there are 2 input stations and therefore the New IDW method should be used at least at 3 or more input stations to make a significant difference. When examining the error values for daily data, model performance was evaluated using different input combinations and data training/test ratios. Similar to the hourly data, the New IDW, IDW and MNR-T methods had very close error values, with all providing more reliable and accurate predictions compared to other methods. In daily data, as in hourly data, MAE and MAPE values show inconsistency due to the change in the precipitation average of Sarıyer/ITU Maslak station in every input station change. In addition, it is seen that the MAE and MAPE values of the IDW and New IDW methods are the same when there are only 2 input stations in the daily data. For this reason, it can be said that the New IDW method will not be used when there are 2 input stations. For the daily dataset; with a 60/40 training/test ratio, as the number of input stations increased, the MAPE value changes as: 25.1, 24.8, 23.3 and 25.5, respectively. The lowest error of 23.3 was observed with 4 input stations. Similarly, for a 70/30 training/test ratio, the MAPE value changes as: 23.5, 24.2, 23.9 and 24.4, respectively. The lowest error of 23.5 was observed with 2 input stations. For the 80/20 training/test ratio, the MAPE value changes as: 23.1, 24.4, 24.0 and 23.6, respectively. The lowest error of 23.1 was observed with 2 input stations. After evaluating hourly and daily data using 7 different methods, 3 training/test ratios and different input stations, the 3 methods with the lowest errors were selected: New IDW, IDW and MNR-T. The best-performing methods for hourly precipitation data in this scenario were New IDW with a value of 39.8, IDW with a value of 39.7 and MNR-T with a value of 40.4. These methods all used Sariyer, Beykoz, Sisli, and Uskudar as input stations and 80/20 training/test rate. For daily precipitation data, the best-performing methods were New IDW with a value of 23.7, using Sariyer, Eyup, Sisli, Beykoz and Uskudar and 80/20 training/test ratio. IDW with a value of 23.3 and, MNR-T with a value of 23.4 performed well when both using Sariyer, Beykoz, Sisli and Uskudar as input stations and 60/40 training/test ratio. This methods showed competitive performance across both hourly and daily datasets, especially in scenarios where station distances played a significant role. These methods were used to fill in the missing data for the Sariyer/ITU Maslak station between 2014 and 2017. During the missing precipitation data imputation process, if any station among the five input stations had missing data, the remaining four stations were used to fill in the missing values. Similarly, when the lowest error was obtained with four input stations (Sariyer, Beykoz, Sisli, Uskudar), predictions were made using data from these four stations, if the data was available. In cases where any of these four stations had missing data, and if data from the Sariyer station was available, Sariyer was included in the imputation process to fill the missing values. This method was applied up to the scenario with two input stations. This approach ensured that the missing data for each target station was predicted using information from the closest neighboring stations. As a result, for the hourly dataset; the New IDW, IDW, and MNR-T methods gave similar and low error values. These three methods also performed well for the daily dataset, showing better accuracy than other methods. These methods are also easier to use and require simpler calculations. Although the ANFIS method can model complex relationships, it gave higher errors in this study. This method may improve if more variables (like temperature or wind speed) and larger datasets are used. Additionally, MIDW1, MIDW2, and IEWM showed higher errors in some cases, especially when fewer stations were used. The results of this study emphasize the importance of filling missing precipitation data with reliable methods to improve the accuracy of meteorological data. To achieve accurate data imputation, factors such as the station's geographic location, the distance between surrounding stations, elevation differences, regional precipitation characteristics, and the performance metrics of the models must be carefully considered. In particular, the choice of methods, input stations and training/test ratios played a key role in the success of the imputation process. Using multiple input stations allowed the methods to capture precipitation patterns more effectively, which enhanced the accuracy of both hourly and daily predictions. Furthermore, using different training/test ratios helped evaluate the methods in various situations, ensuring their reliability. This is especially important in fields like water management, climate planning, agriculture, and weather prediction, where accurate data is essential for making informed decisions and preparing for climate challenges. Future studies could explore the application of these methods in various geographic regions and climate conditions to assess their effectiveness and reliability in different environmental contexts and precipitation patterns.
Açıklama
Tez (Yüksek Lisans)-- İstanbul Teknik Üniversitesi, Lisansüstü Eğitim Enstitüsü, 2025
Anahtar kelimeler
meteoroloji,
meteorology,
mühendislik Bilimleri,
engineering sciences,
hidroklimatoloji,
hydroclimatology,
hidrometeoroloji,
hydrometeorology