Orthogonality based feature selection for ai applications
Orthogonality based feature selection for ai applications
dc.contributor.advisor | Üstündağ, Burak Berk | |
dc.contributor.author | Şentop, Mehmet Selahaddin | |
dc.contributor.authorID | 504221523 | |
dc.contributor.department | Computer Engineering | |
dc.date.accessioned | 2025-02-24T07:03:55Z | |
dc.date.available | 2025-02-24T07:03:55Z | |
dc.date.issued | 2024-08-19 | |
dc.description | Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2024 | |
dc.description.abstract | Feature selection is a significant aspect of AI models, which directly influences their accuracy and efficiency. A common problem in this process is redundancy among features, where multiple features provide overlapping information. Besides being inefficient, this redundancy can cause overfitting, where a model becomes too tailored to the specific data it was trained on and fails to generalize to new data. To tackle these challenges, this thesis introduces an orthogonality-based approach to feature selection. By ensuring that the selected features are independent and non-redundant, this approach improves the model's performance across various tasks. Two example applications—data imputation and short-term forecasting—are explored to demonstrate the effectiveness of this approach. Missing, distorted, or inaccurate data is a serious problem in many fields, including agriculture, healthcare, and environmental monitoring. These gaps in data can make it hard to trust the results of any analysis or decisions based on that data. Problems like sensor breakdowns, transmission errors, or incomplete data collection can make entire datasets unreliable. When this happens, it can lead to biased conclusions and poor decisions. This issue is especially serious in situations where decisions need to be made quickly and accurately, like in real-time systems. For example, if there's missing data in an agricultural monitoring system, it could lead to wrong decisions about watering crops, which could harm yields. To solve this problem, this study introduces a new orthogonality-based feature selection method called the Predictive Error Compensated Neural Network (PECNET) model. PECNET uses a method that focuses on selecting data features that are independent from each other and correcting errors in predictions to improve the accuracy of filling in missing data and making short-term forecasts. The study is based on two main ideas. First, it suggests that advanced machine learning models like PECNET can do a better job than traditional methods at finding and using patterns in complex data. Second, it believes that by making sure the features the model uses are independent, PECNET can avoid overfitting, which happens when a model is too closely tailored to the specific data it was trained on and does not work well with new data. PECNET's approach to select which data to focus on is a key part of its success. The model begins by looking at how different data points relate to each other and to the target being predicted. It first picks the data feature that has the biggest impact on the target. Then, instead of just adding more similar features, PECNET focuses on predicting and correcting errors from earlier predictions. This way, it finds new patterns in the data that were not considered before, helps to avoid repetition and makes the model better at handling new data. The study tested PECNET using data from The Agricultural and Environmental Informatics Research and Application Center (TARBIL), a system that collects agricultural and environmental information from across Türkiye. PECNET was tested in two types of experiments for missing data imputation: one where data from just one station was used, and another where data from several nearby stations was combined. In both types of experiments, PECNET, especially when combined with Discrete Wavelet Transform (DWT), showed better accuracy than traditional methods. Numerically, PECNET + DWT achieved more than 50% less Root Mean Squared Error (RMSE) for single station experiments and up to 80% less RMSE for multi-station experiments. The model's ability to use data from multiple stations led to big improvements in predicting challenging variables like wind speed and humidity. Besides filling in missing data, PECNET was also tested on predicting short-term rainfall, which is very important for farming. Accurate rainfall predictions help farmers make better decisions about when to water crops, manage land, and estimate yields. In these tests, PECNET performed better than traditional models like Long Short-Term Memory (LSTM) and Prophet by achieving 50% less Mean Absolute Percentage Error (MAPE) and three times less RMSE and Mean Absolute Error (MAE). PECNET's ability to combine different types of independent data helped it make more accurate and reliable short-term rainfall forecasts. In summary, orthogonality-based feature selection method, whose impact is shown through PECNET, offers a new and effective way to deal with the challenges of missing data and short-term forecasting. By focusing on selecting independent data features, the method not only improves accuracy but also avoids common pitfalls like overfitting. The study's results support the initial hypotheses, showing that orthogonality-based feature selection can effectively overcome the limitations of traditional methods. Its successful application to the TARBIL dataset suggests that it could be a valuable tool in many fields where accurate data and forecasts are crucial. This research is an important step forward in improving how data is analyzed and decisions are made. | |
dc.description.degree | M.Sc. | |
dc.identifier.uri | http://hdl.handle.net/11527/26510 | |
dc.language.iso | en_US | |
dc.publisher | Graduate School | |
dc.sdg.type | Goal 2: Zero Hunger | |
dc.sdg.type | Goal 3: Good Health and Well-being | |
dc.sdg.type | Goal 17: Partnerships to achieve the Goal | |
dc.subject | deep learning | |
dc.subject | derin öğrenme | |
dc.subject | real time forecasting | |
dc.subject | gerçek zaman tahmini | |
dc.subject | short-term forecasting | |
dc.subject | kısa dönemli öngörü | |
dc.subject | machine learning | |
dc.subject | makine öğrenmesi | |
dc.subject | artificial neural networks | |
dc.subject | yapay sinir ağları | |
dc.title | Orthogonality based feature selection for ai applications | |
dc.title.alternative | Yapay zeka uygulamaları için ortogonalite tabanlı öznitelik seçimi | |
dc.type | Master Thesis |