Semi-supervised learning strategy for improved flash point prediction

Sülük, Mert

Semi-supervised learning strategy for improved flash point prediction

dc.contributor.advisor	Öğüdücü, Şule
dc.contributor.author	Sülük, Mert
dc.contributor.authorID	504201527
dc.contributor.department	Computer Engineering
dc.date.accessioned	2025-06-17T06:24:43Z
dc.date.available	2025-06-17T06:24:43Z
dc.date.issued	2024-08-20
dc.description	Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2024
dc.description.abstract	This thesis explores the application of semi-supervised learning techniques to enhance the prediction of flash points in the oil industry, which are critical for ensuring the safety of transporting and storing petroleum products. Flash points denote the lowest temperature at which a substance's vapors ignite in air, a crucial parameter that traditional methods ascertain through costly and time-consuming laboratory tests. This study proposes a data-driven approach to optimize these processes more efficiently and effectively. Semi-supervised learning, which leverages both labeled and unlabeled data, provides a robust framework especially valuable in scenarios where data labeling is prohibitively expensive or logistically challenging. This research integrates sensor data such as pressure, temperature, and flow rates with sparse flash point measurements to develop a predictive model. The aim is to reduce dependency on extensive laboratory testing while enhancing operational efficiency and safety protocols. The central research questions addressed are: How can flash points be accurately predicted in the oil industry when only a limited number of labeled data points are available? Given these constraint, could semi-supervised learning method be an effective solution? What are the specific advantages and limitations of these technique within the oil industry context? The study validates the effectiveness of semi-supervised learning method and develops a model that improves upon traditional approaches. To address the research questions, particularly in the context of improving flash point predictions with limited labeled data, the study employs data preprocessing techniques and modeling processes that are essential for optimizing model performance. The methodology employs two principal data preprocessing techniques: Winsorization and Min-Max Scaling. Winsorization mitigates the effects of outliers by limiting extreme data points within a designated percentile range, ensuring the model is not skewed by anomalies. Min-Max Scaling normalizes the data, allowing for equitable evaluation of all features and preventing any single feature from dominating the model's output. The modeling process involves the Gaussian Process Regressor and the Random Forest model. The Gaussian Process Regressor, suitable for continuous data, provides uncertainty estimates to gauge the reliability of predictions. The Random Forest model enhances stability and accuracy by aggregating predictions from multiple decision trees. Initially trained on labeled data, the Gaussian Process Regressor subsequently predicts labels for unlabeled data, incorporating those predictions within a specified confidence interval into the training set. This expanding dataset further trains the Random Forest model, applying an expanding window approach to incrementally improve prediction capabilities. Performance metrics such as Mean Absolute Error and Root Mean Squared Error assess model efficacy. The baseline model initially yielded an mean absolute error of 1.1 degrees in flash point predictions. With the application of the semi-supervised learning model, Mean Absolute Error improved to 1.01 and Root Mean Squared Error decreased to 1.63, demonstrating significant enhancements in accuracy through the inclusion of unlabeled data. In conclusion, this thesis illustrates the potential of semi-supervised learning to bridge the gap caused by a scarcity of labeled data, particularly in critical industrial applications like oil processing. The findings suggest that semi-supervised learning not only reduces the financial and temporal expenditures associated with traditional testing methods but also offers a scalable, efficient alternative poised to transform industry practices. The methodologies developed here have broader implications, suggesting that semi-supervised learning could be similarly beneficial in other sectors where data labeling is a significant constraint and even small performance improvements are critical due to the importance of the parameters being predicted.
dc.description.degree	M.Sc.
dc.identifier.uri	http://hdl.handle.net/11527/27319
dc.language.iso	en_US
dc.publisher	Graduate School
dc.sdg.type	Goal 9: Industry, Innovation and Infrastructure
dc.subject	Prediction
dc.subject	Machine learning
dc.subject	Petroleum
dc.subject	Random forests
dc.subject	Sensors
dc.title	Semi-supervised learning strategy for improved flash point prediction
dc.title.alternative	Parlama noktası tahminini iyileştirmek için yarı denetimli öğrenme stratejisi
dc.type	Master Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 504201527.pdf
Boyut:: 523.05 KB
Format:: Adobe Portable Document Format

İndir

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: license.txt
Boyut:: 1.58 KB
Format:: Item-specific license agreed upon to submission
Açıklama

İndir

Koleksiyonlar

LEE- Bilgisayar Mühendisliği-Yüksek Lisans