Publication:
XAU/USD price prediction using deep learning: hyperparameter optimization with bayesian, grey-wolf and genetic algorithms

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

ITU Graduate School

Research Projects

Organizational Units

Journal Issue

Abstract

Gold has consistently played a significant role in the global economy, not only because of its chemical and physical properties, but also because of its enduring status as a safe haven and investment instrument. For centuries, gold has been seen as a reliable hedge against inflation and currency fluctuations, especially in times of financial uncertainty, geopolitical conflict, or severe economic crisis. The continued demand for gold is reflected in its liquidity, its use as a hedge by investors, governments, and central banks. The XAU/USD ounce price, in particular, serves as a key benchmark for pricing financial instruments such as options, futures, swaps, and forwards, and is closely monitored for insights into broader market dynamics. This has increased the importance of forecasting the XAU/USD ounce price. However, the complexity of the gold market increases its sensitivity to a multitude of macroeconomic and financial indicators, such as exchange rates, stock indices, oil prices, and interest rates. The complex, non-linear, and often non-stationary relationships between these factors make accurate gold price prediction a formidable challenge. While machine learning and deep learning models offer enhanced flexibility and capacity to capture complex relationships, the effectiveness of these methods largely depends on the optimal hyperparameter selection. Traditionally, hyperparameter selection has relied on grid search, random search or manual trial-and-error approaches. This creates a significant research gap as systematic and more comprehensive hyperparameter optimization has not been sufficiently investigated in the context of gold price prediction. This thesis aims to address this gap by systematically investigating the impact of deep learning models such as long short-term memory (LSTM), recurrent neural networks (RNN), convolutional neural network (CNN), gated recurrent units (GRU), temporal convolutional networks (TCN) and Bayesian, Grey-Wolf, and Genetic hyperparameter optimization algorithms on the forecast performance of the daily closing price of XAU/USD. The primary objectives are fourfold: (1) to determine the most appropriate deep learning-hyperparameter optimization model for XAU/USD ounce price prediction; (2) to determine the most informative window size for time series prediction; (3) to evaluate the impact and effectiveness of hyperparameter optimization methods on the forecast performance; and (4) to explain the comparative strengths and weaknesses of each deep learning architecture in this context. The study used the XAU/USD daily ounce price from 01.01.2018 to 10.01.2025 and the XAG/USD futures price, BTC/USDT, GBP/USD, USD/JPY, S&P500, Brent oil, WTI oil, DJIA, VIX and 10-year US Treasury yields, which are frequently used as featue in the literature in the prediction studies for XAU/USD. Statistical analyses such as skewness, kurtosis and Jarque-Bera normality tests were used in the study and these analyses showed non-normal, asymmetric and complex distributional properties in most variables in the data set. This situation further emphasizes the inadequacy of linear models and strengthens the motivation for a deep learning-based approach. The correlation analysis also reveals that XAG/USD, S&P500, DJIA, BTC/USDT and USD/JPY show the strongest relationships with XAU/USD, while GBP/USD shows a negative correlation. The study consists of two phases. In the first phase, deep learning models were trained using default hyperparameters with window sizes of 8, 16, 32 and 64, while in the second phase, hyperparameter optimization was applied via the above-mentioned algorithms and hyperparameter optimization was applied on model parameters such as hidden size, learning rate, filters and kernel size. In the study, the empty data in the dataset were filled with the forward filling method. The dataset was normalized with MinMaxScaler. After this step, the dataset was divided into 90% training and 10% test sets and the last 10% of the training set was separated as the validation set using time-based splitting. Then, sequences were created for 8, 16, 32 and 64 window sizes with a sliding window. The models were implemented using Python 3.8 with libraries such as NumPy, Pandas, Matplotlib, Scikit-learn and Tensorflow/Keras and were run on a system equipped with Windows 10 operating system, 16 GB RAM and Intel Core i7 processor. The model results were reverse normalized and the performance of the models in phase 1 and phase 2 was evaluated by calculating the mean absolute percentage error (MAPE), mean absolute error (MAE), root mean squared error (RMSE) and coefficient of determination (R2) for the test set with these data. Whether the models gave reliable and generalizable results and whether there was overfitting were examined with 5-fold cross validation. The empirical results reveal several important insights. Models incorporating hyperparameter optimization consistently outperformed those using default parameters across all window sizes and deep learning models. Bayesian optimization emerges as the most effective and computationally efficient method, providing the best balance between prediction accuracy and resource utilization. In contrast, the Genetic algorithm generally shows the lowest performance, especially as the window size increases, suggesting limitations in search efficiency for this application. The window size played a critical role in model performance, with the most accurate predictions generally obtained at window sizes of 8 and 16. This finding is consistent across both phases and most deep learning model-hyperparameter optimization algorithm combinations. Models such as RNN and CNN show higher sensitivity to window size, while LSTM, GRU, and TCN show more robust performance with varying window lengths. Among deep learning models, GRU-Bayesian optimization in particular outperformed other models both in the performance comparison with testing data and in 5-fold cross validation results. Despite the foundational, RNN is hampered by the vanishing gradient problem and limited to capturing long-term dependencies and is found to be more prone to Overfitting. GRU presented a positive trade-off between complexity, training speed and forecasting power. LSTM can be said to be moderately sensitive to window changes. Despite its power to remove local patterns for CNN, it can be said that it is less effective in modeling temporal addictions and more sensitive to the window size. TCN has shown strong aspects in modeling long-term dependencies with dilated convolutions, but it has been seen that it has high sensitivity to the window size. A 5-fold cross validation is used to show that the models used in Phase 1 and Phase 2 yielded reliable and generalizable results. Thus, it is aimed to show the validity of empirical findings. The findings have important consequences for practitioners and policy makers. The superiority of Bayesian optimization in deep learning based gold price prediction proposes a clear way for practitioners who want to develop efficient prediction systems with high accuracy. Determination of optimum window sizes and comparative analysis of model architecture provides applicable guidance for model selection and configuration in similar financial prediction studies. While this study provides a comprehensive benchmark and addresses key research gaps, there are several avenues for further research. Future studies can extend the approach to other precious metals and commodities, integrate advanced feature selection and dimensionality reduction techniques, and attempt alternative model validation strategies beyond 5-fold cross-validation. It is possible to investigate whether model performance can be further improved by including advanced feature selection methods in this study. Additionally, investigating the impact of regime changes on model stability and forecast accuracy could provide deeper insights. In conclusion, this thesis study makes a significant contribution to the gold price forecasting literature by systematically evaluating and comparing the forecasting performance of multiple deep learning models and hyperparameter optimization algorithms. In particular, by demonstrating the critical role of automatic, intelligent hyperparameter tuning via Bayesian optimization, the study not only fills an important methodological gap but also lays a foundation for future research and practical applications in financial time series forecasting. The insights gained here extend beyond gold to provide methodological guidance for modeling complex, nonlinear, and volatile time series in a variety of financial and economic contexts.

Description

Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2025

Subject

bilgisayar mühendisliği bilimleri, bilgisayar ve kontrol, computer engineering, computer science and control, endüstri ve endüstri mühendisliği, industrial and industrial engineering, mühendislik bilimleri, engineering sciences

Citation

Endorsement

Review

Supplemented By

Referenced By

Related Goal

14

Views

40

Downloads