Short-term wind power generation forecasting by coupling numerical weather prediction models and machine learning algorithms

Özen, Cem
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Graduate School
Renewable energy has a crucial place in ensuring the security of the energy supply and achieving energy independence for the countries. Furthermore, the transition to renewable energy which is an eco-friendly alternative to the conventional power generation methods with fossil fuels has a very influential role in preventing global climate change. In addition to all these motives; wind energy has become a primarily preferred energy source for countries and investors thanks to having one of the cheapest levelized cost of energy among both renewable and other energy sources in recent years. The power generation in wind power plants is directly associated with the wind, which is an atmospheric variable that is difficult to predict with its dynamic structure and chaotic nature. Moreover, forecasting the wind which is an intermittent energy source becomes very important by considering the increasing ratio of the wind in the total energy share in terms of stability and reliability of the electricity grids. In order to ensure the energy supply security and keep the electricity grid in balance, wind power plant owners like all other power plants are required to undertake their power generation forecasts to the institutions responsible for the energy markets and/or transmission of the electricity of the countries. Any deviation between the observation and the forecast results in energy imbalance and it causes energy imbalance penalties for the power plant owners. Therefore, increasing the accuracy of the power generation forecasts not only prevents the large financial penalties it also contributes to energy supply security by facilitating the control of the electricity grid. In this thesis study, short-term power generation forecasts of wind power plants were covered in detail and three articles prepared about the topic have been published in the international peer-reviewed journal, Wind Engineering. In the first article, a novel hybrid day ahead wind power forecasting model that couples numerical weather prediction (NWP) model and gradient boosting machines have been proposed. While the Weather Research and Forecasting (WRF) model is used as NWP, two different WRF models have been run in the study. The first model has been run in low spatial resolution and their outputs were directly used in machine learning model training. Global Data Assimilation System (GDAS) data with a 0.25-degree spatial resolution has been used as the initial and boundary condition data for the low-resolution model. The reason for using the outputs of WRF models instead of using GDAS data directly is to increase the temporal resolution up to 10 minutes with a dynamical model instead of statistical methods. While the outputs extracted from the surrounding four grid points were used for the training of the model, a high-resolution WRF model with 333 meters of spatial resolution has also been run to compare the results of the proposed model with a well-configured WRF model. Since the study has been focused on the day-ahead wind power forecast, day-ahead forecasts of Global Forecast System (GFS) data were used in the testing of the proposed model and used as initial and boundary condition data for the WRF model. The proposed model has shown its superiority to the WRF model according to the statistical performance metrics, and improvement of 28.86%, 28.47%, and 14.8% has been reached in mean absolute error, root mean squared error, and Pearson correlation respectively. Besides its superiority in statistical metrics, the proposed model could also produce its forecasts in just 28.75 seconds after a training process which is done only once, while the WRF model requires 2.9 hours. Therefore, computational time in the operational stage of the model has also outperformed the WRF model. In the second article, a country-based wind power generation (WPG) forecast model was proposed using the CatBoost model with atmospheric variables of surface level and 700 hPa, 500 hPa, and 300 hPa pressure levels are extracted from the ERA5 data, which has 1-hour temporal and 2.5-degree spatial resolution. Twenty-six out of thirty-six different grid points which is the total grid number with 2.5 degrees to cover the entire country have been selected considering Turkey's spatial distribution of wind power plants. Besides the atmospheric variables, virtual wind turbines (VWT) have been cited on each grid point based on the wind class so that the power generation output of each VWT is calculated and used in training. Day-ahead forecasts of High Resolution (HRES) data of the European Centre for Medium-Range Weather Forecasts's (ECMWF) have been used as the test subset in this study since ERA5 and HRES resulted with the same model which is the Integrated Forecast System (IFS) of ECMWF. This also leads to a better understanding of the accuracy of the proposed forecast model. On the other hand, due to the continuous increase in Turkey's installed wind power, Turkey's hourly wind energy production was not directly used as the outcome of the model; instead, hourly production divided by total installed power was used. As in the first study, a decision tree-based machine learning algorithm, Catboost was used so that the importance of each variable was also presented. On the other hand, while feature selection (FS) methods were also included in the study; the effects of each of these methods on the model were also examined. After applying the collinearity detection in all data, Lasso, two different principal component analysis (PCA), recursive feature elimination (RFE), generalized orthogonal matching pursuit (gOMP), and forward variable elimination methods with early dropping (FBED) were used. While these methods reduce the complexity of the model by reducing the number of variables; they were also used to increase the accuracy. In addition, using the results of these FS methods; five different hybrid FS methods have also been proposed. The first of these is created by choosing the variables of the grid point that has been selected mostly by the FS methods; the remaining four select the variables selected by at least three, four, five, and six of these abovementioned six different FS methods, respectively. The fourth hybrid method has outperformed all the other methods in the study, and the normalized root mean square error and R2 were calculated as 7.6% and 0.8989, respectively. Besides, the energy production of the VWTs is selected as the most important variable, followed by wind speed and direction. In the third article, a short-term wind speed forecasting model which can predict the wind speeds of the six wind turbines of a wind farm located in the western part of Turkey from 10 minutes to 1 hour is proposed. Since this study is not focused on day-ahead forecasts and differs from the first and the second, GFS or HRES data were not used so that the forecasting has been done with the CatBoost model by using the System and Supervisory Control and Data Acquisition (SCADA) based data of the wind turbines, and the outputs of the two different WRF models have been used. While the first WRF model is configured in a single domain and National Centers for Environmental Prediction/Final (NCEP/FNL) data with a 0.25-degree spatial resolution has been used as initial and boundary condition data in that model, the outputs of the model have also 0.25-degree spatial resolution and 10 minutes time-frequency. On the other hand, the second WRF model was run to obtain the weather patterns affecting the wind farm. A VWT algorithm that has been used in the first and second studies was not used in the third article since it is aimed to forecast wind speed. Since SCADA data has outliers and missing data within, data preprocessing techniques like outlier detection, data treatment, and missing data imputation have been applied to the SCADA data before feeding the data into the model. First of all, a method in which k-means and isolation tree applications were combined to detect outliers in the data. Therefore, statistical models have been used to treat those predetermined outliers. On the other hand, the CatBoost model was used to build the relationship between WRF model outputs and SCADA data. This model has been used to impute the missing data afterward. In the study, the effects of three different data, namely SCADA, weather pattern, WRF model outputs, and the three data preprocessing techniques applied to SCADA data which are outlier detection, data treatment, and missing data imputation, on the wind speed forecast model were examined separately. Since it is aimed to forecast the wind speed of each wind turbine at 10-minute time intervals from 10 minutes to 1 hour, there were 36 variables in total to be predicted. While the best model has been chosen as the model in which all data preprocessing was performed and all different data types were used considering the statistical performance metrics, each proposed model has outperformed the simple persistence model which uses the previous time step for the next time step. On the other hand, while the air pattern that most affects Urla was calculated as purely advective with 50.76% relative frequency, the best mean absolute percentage error was obtained with 14.534% in this weather pattern. According to R2, the highest performance was seen in hybrid weather patterns with 0.9161; The lowest root mean square error and mean absolute error were observed in the pure anticyclonic weather pattern, which is usually associated with low wind speeds.
Thesis(Ph.D.) -- Istanbul Technical University, Graduate school, 2022
Anahtar kelimeler
machine learning, makine öğrenmesi, micrometeorology, mikrometeoroloji, wind energy, rüzgar enerjisi, renewable energy, yenilenebilir enerji