Orthogonality based feature selection for ai applications

dc.contributor.advisor Üstündağ, Burak Berk
dc.contributor.author Şentop, Mehmet Selahaddin
dc.contributor.authorID 504221523
dc.contributor.department Computer Engineering
dc.date.accessioned 2025-02-24T07:03:55Z
dc.date.available 2025-02-24T07:03:55Z
dc.date.issued 2024-08-19
dc.description Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2024
dc.description.abstract Feature selection is a significant aspect of AI models, which directly influences their accuracy and efficiency. A common problem in this process is redundancy among features, where multiple features provide overlapping information. Besides being inefficient, this redundancy can cause overfitting, where a model becomes too tailored to the specific data it was trained on and fails to generalize to new data. To tackle these challenges, this thesis introduces an orthogonality-based approach to feature selection. By ensuring that the selected features are independent and non-redundant, this approach improves the model's performance across various tasks. Two example applications—data imputation and short-term forecasting—are explored to demonstrate the effectiveness of this approach. Missing, distorted, or inaccurate data is a serious problem in many fields, including agriculture, healthcare, and environmental monitoring. These gaps in data can make it hard to trust the results of any analysis or decisions based on that data. Problems like sensor breakdowns, transmission errors, or incomplete data collection can make entire datasets unreliable. When this happens, it can lead to biased conclusions and poor decisions. This issue is especially serious in situations where decisions need to be made quickly and accurately, like in real-time systems. For example, if there's missing data in an agricultural monitoring system, it could lead to wrong decisions about watering crops, which could harm yields. To solve this problem, this study introduces a new orthogonality-based feature selection method called the Predictive Error Compensated Neural Network (PECNET) model. PECNET uses a method that focuses on selecting data features that are independent from each other and correcting errors in predictions to improve the accuracy of filling in missing data and making short-term forecasts. The study is based on two main ideas. First, it suggests that advanced machine learning models like PECNET can do a better job than traditional methods at finding and using patterns in complex data. Second, it believes that by making sure the features the model uses are independent, PECNET can avoid overfitting, which happens when a model is too closely tailored to the specific data it was trained on and does not work well with new data. PECNET's approach to select which data to focus on is a key part of its success. The model begins by looking at how different data points relate to each other and to the target being predicted. It first picks the data feature that has the biggest impact on the target. Then, instead of just adding more similar features, PECNET focuses on predicting and correcting errors from earlier predictions. This way, it finds new patterns in the data that were not considered before, helps to avoid repetition and makes the model better at handling new data. The study tested PECNET using data from The Agricultural and Environmental Informatics Research and Application Center (TARBIL), a system that collects agricultural and environmental information from across Türkiye. PECNET was tested in two types of experiments for missing data imputation: one where data from just one station was used, and another where data from several nearby stations was combined. In both types of experiments, PECNET, especially when combined with Discrete Wavelet Transform (DWT), showed better accuracy than traditional methods. Numerically, PECNET + DWT achieved more than 50% less Root Mean Squared Error (RMSE) for single station experiments and up to 80% less RMSE for multi-station experiments. The model's ability to use data from multiple stations led to big improvements in predicting challenging variables like wind speed and humidity. Besides filling in missing data, PECNET was also tested on predicting short-term rainfall, which is very important for farming. Accurate rainfall predictions help farmers make better decisions about when to water crops, manage land, and estimate yields. In these tests, PECNET performed better than traditional models like Long Short-Term Memory (LSTM) and Prophet by achieving 50% less Mean Absolute Percentage Error (MAPE) and three times less RMSE and Mean Absolute Error (MAE). PECNET's ability to combine different types of independent data helped it make more accurate and reliable short-term rainfall forecasts. In summary, orthogonality-based feature selection method, whose impact is shown through PECNET, offers a new and effective way to deal with the challenges of missing data and short-term forecasting. By focusing on selecting independent data features, the method not only improves accuracy but also avoids common pitfalls like overfitting. The study's results support the initial hypotheses, showing that orthogonality-based feature selection can effectively overcome the limitations of traditional methods. Its successful application to the TARBIL dataset suggests that it could be a valuable tool in many fields where accurate data and forecasts are crucial. This research is an important step forward in improving how data is analyzed and decisions are made.
dc.description.degree M.Sc.
dc.identifier.uri http://hdl.handle.net/11527/26510
dc.language.iso en_US
dc.publisher Graduate School
dc.sdg.type Goal 2: Zero Hunger
dc.sdg.type Goal 3: Good Health and Well-being
dc.sdg.type Goal 17: Partnerships to achieve the Goal
dc.subject deep learning
dc.subject derin öğrenme
dc.subject real time forecasting
dc.subject gerçek zaman tahmini
dc.subject short-term forecasting
dc.subject kısa dönemli öngörü
dc.subject machine learning
dc.subject makine öğrenmesi
dc.subject artificial neural networks
dc.subject yapay sinir ağları
dc.title Orthogonality based feature selection for ai applications
dc.title.alternative Yapay zeka uygulamaları için ortogonalite tabanlı öznitelik seçimi
dc.type Master Thesis
Dosyalar
Orijinal seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.alt
Ad:
504221523.pdf
Boyut:
3.34 MB
Format:
Adobe Portable Document Format
Açıklama
Lisanslı seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.placeholder
Ad:
license.txt
Boyut:
1.58 KB
Format:
Item-specific license agreed upon to submission
Açıklama