ISTANBUL TECHNICAL UNIVERSITY « GRADUATE SCHOOL M.Sc. THESIS AUGUST 2025 DATA-DRIVEN PREDICTION OF LIGHTING ENERGY CONSUMPTION IN TERTIARY BUILDINGS USING MACHINE LEARNING Agshin RZAZADE Department of Electrical Engineering Electrical Engineering Programme Department of Electrical Engineering Electrical Engineering Programme AUGUST 2025 ISTANBUL TECHNICAL UNIVERSITY « GRADUATE SCHOOL DATA-DRIVEN PREDICTION OF LIGHTING ENERGY CONSUMPTION IN TERTIARY BUILDINGS USING MACHINE LEARNING M.Sc. THESIS Agshin RZAZADE (504231002) Thesis Advisor: Asst. Prof. Dr. Lale ERDEM ATILGAN Elektrik Mühendisliği Anabilim Dalı Elektrik Mühendisliği Programı AĞUSTOS 2025 ISTANBUL TEKNİK ÜNİVERSİTESİ « LİSANSÜSTÜ EĞİTİM ENSTİTÜSÜ HİZMET SEKTÖRÜ BİNALARINDA AYDINLATMA ENERJİ TÜKETİMİNİN MAKİNE ÖĞRENMESİ İLE VERİ ODAKLI TAHMİNİ YÜKSEK LİSANS TEZİ Agshin RZAZADE (504231002) Tez Danışmanı: Dr. Öğr. Üyesi Lale ERDEM ATILGAN v Thesis Advisor : Asst. Prof. Dr. Lale ERDEM ATILGAN .............................. Istanbul Technical University Jury Members : Asst. Prof. Dr. Mustafa Berker YURTSEVEN .................... Istanbul Technical University Asst. Prof. Dr. Mustafa Alparslan ZEHİR ..................... Marmara University Agshin Rzazade, a M.Sc. student of ITU Graduate School student ID 504231002, successfully defended the thesis entitled “DATA-DRIVEN PREDICTION OF LIGHTING ENERGY CONSUMPTION IN TERTIARY BUILDINGS USING MACHINE LEARNING”, which he prepared after fulfilling the requirements specified in the associated legislations, before the jury whose signatures are below. Date of Submission : 30 May 2025 Date of Defense : 1 August 2025 vi vii To my family, viii ix FOREWORD First and foremost, I owe my deepest gratitude to my family for always supporting me throughout every stage of this journey. Your encouragement, patience, and love have been the foundation upon which I have built both my academic pursuits and personal growth. Without your support and understanding, the completion of this work would not have been possible. I would like to extend my sincere thanks to my academic advisor, Asst. Prof. Dr. Lale ERDEM ATILGAN. Your guidance, insightful feedback, and steadfast commitment to excellence have shaped this thesis in countless ways. Your expertise challenged me to refine my ideas, your constructive criticism strengthened my methodology, and your unwavering encouragement kept me motivated even when the path seemed most difficult. I am also grateful to Istanbul Technical University, whose rich academic environment and resources provided the ideal setting for conducting this research. The faculty members across various departments have contributed significantly to my development as a scholar. In particular, I wish to thank every professor who shared their knowledge, offered valuable advice, and inspired me through their dedication to teaching and research. Each interaction has broadened my perspective and deepened my understanding of the field. In closing, I hope that this work reflects not only the support and mentorship I have received but also contributes meaningfully to engineering field. To all who have in any way guided or encouraged me, thank you for helping to bring this thesis to fruition. May 2025 Agshin RZAZADE (Electrical Engineer) x xi TABLE OF CONTENTS Page FOREWORD ............................................................................................................. ix TABLE OF CONTENTS .......................................................................................... xi ABBREVIATIONS ................................................................................................. xiii SYMBOLS ................................................................................................................ xv LIST OF TABLES .................................................................................................. xix LIST OF FIGURES ................................................................................................ xxi SUMMARY ............................................................................................................ xxv ÖZET ............................................................................................................ xxvii INTRODUCTION .................................................................................................. 1 Purpose of Thesis ............................................................................................... 1 Introduction to Energy Efficiency in Lighting Systems ..................................... 2 Overview of Global Energy Consumption Trends ............................................. 4 Lighting’s Share in Building Energy Use .......................................................... 5 Evolution of Lighting Strategies ........................................................................ 7 Comparative Evaluation of Machine Learning Algorithms ............................. 11 Utilized Machine Learning Approaches ........................................................... 17 1.7.1 Linear models ............................................................................................ 18 1.7.2 Decision Tree ............................................................................................ 20 1.7.3 Ensemble models ...................................................................................... 21 Emprical Insights into Residential Lighting Behavior and Energy Efficiency 23 Hypothesis ........................................................................................................ 26 METHODOLOGY ............................................................................................... 27 Building Description and Zonal Monitoring Framework ................................. 27 Data Selection and Rationale ........................................................................... 29 Tools, Technologies, and Environment ............................................................ 30 Model Evaluation Metrics ................................................................................ 30 Script-Based Implementation and Refinements ............................................... 33 RESULTS .............................................................................................................. 43 Exploratory Data Analysis (EDA) Indoor Predictors ....................................... 43 3.1.1 Floor 1 ....................................................................................................... 43 3.1.2 Floor 2 ....................................................................................................... 45 3.1.3 Floor 3 ....................................................................................................... 46 3.1.4 Floor 4 ....................................................................................................... 48 3.1.5 Floor 5 ....................................................................................................... 50 3.1.6 Floor 6 ....................................................................................................... 52 3.1.7 Floor 7 ....................................................................................................... 54 3.1.8 Floors combined ........................................................................................ 56 Power Demand Pattern Analyis ....................................................................... 58 3.2.1 Floor 1 ....................................................................................................... 59 3.2.2 Floor 2 ....................................................................................................... 62 3.2.3 Floor 3 ....................................................................................................... 65 3.2.4 Floor 4 ....................................................................................................... 68 xii 3.2.5 Floor 5 ....................................................................................................... 71 3.2.6 Floor 6 ....................................................................................................... 74 3.2.7 Floor 7 ....................................................................................................... 77 3.2.8 Floor 5 – Monday, Saturday, Sunday ........................................................ 80 Analyis of Linear Regression Models .............................................................. 82 3.3.1 Floor 1 ....................................................................................................... 82 3.3.2 Floor 2 ....................................................................................................... 84 3.3.3 Floor 3 ....................................................................................................... 86 3.3.4 Floor 4 ....................................................................................................... 88 3.3.5 Floor 5 ....................................................................................................... 90 3.3.6 Floor 6 ....................................................................................................... 92 3.3.7 Floor 7 ....................................................................................................... 94 3.3.8 Summary ................................................................................................... 96 Decision Tree and Ensemble Models ............................................................... 97 3.4.1 Floor 1 ....................................................................................................... 98 3.4.2 Floor 2 ..................................................................................................... 100 3.4.3 Floor 3 ..................................................................................................... 102 3.4.4 Floor 4 ..................................................................................................... 104 3.4.5 Floor 5 ..................................................................................................... 106 3.4.6 Floor 6 ..................................................................................................... 108 3.4.7 Floor 7 ..................................................................................................... 110 3.4.8 Comparison of all 7 floors ....................................................................... 112 Detailed Bagged Tree Model Analysis ........................................................... 113 3.5.1 Floor 1 ..................................................................................................... 114 3.5.2 Floor 2 ..................................................................................................... 116 3.5.3 Floor 3 ..................................................................................................... 118 3.5.4 Floor 4 ..................................................................................................... 120 3.5.5 Floor 5 ..................................................................................................... 123 3.5.6 Floor 6 ..................................................................................................... 125 3.5.7 Floor 7 ..................................................................................................... 128 3.5.8 Cross floor trends .................................................................................... 130 3.5.9 Total vs. zonal average ............................................................................ 131 CONCLUSIONS AND RECOMMENDATIONS ........................................... 133 REFERENCES ....................................................................................................... 137 CURRICULUM VITAE ........................................................................................ 145 xiii ABBREVIATIONS AC : Air Conditioning AIC : Akaike Information Criteria ANN : Artificial Neural Network API : Application Programming Interface AVG : Average BACnet : Building Automation and Control Networks BEMS : Building Energy Management System CBECS : Commercial Building Energy Consumption Survey CNN : Convolutional Neural Network CU-BEMS : Chulalongkorn University Building Energy Management System DL : Deep Learning DNN : Deep Neural Network EDA : Exploratory Data Analysis EISA : Energy Independence and Security Act EPAct : Energy Policy Act HDR : High Dynamic Range HVAC : Heating, Ventilation, and Air Conditioning IEA : International Energy Agency IoT : Internet of Things LAN : Local Area Network LED : Light Emitting Diode LSTM : Long Short-Term Memory M2M : Machine-to-Machine MAE : Mean Absolute Error MIMO : Multi-Input Multi-Output MLP : Multi Layer Perceptron MQTT : Message Queuing Telemetry Transport MSE : Mean Squared Error OLS : Ordinary Least Squares PIR : Passive Infrared xiv RL : Reinforcement Learning RMSE : Root Mean Squared Error SVM : Support Vector Machine TOT : Total UNEP : United Nations Environment Programme VAE : Variational Autoencoder Wi-Fi : Wireless Fidelity xv SYMBOLS % : Percentage 𝟏𝑿∈𝑹𝒎 : A function that returns 1 when the feature vector 𝑋 falls into region 𝑅$, and 0 otherwise. AvgLux : Mean of all illuminance readings from cealing-mounted sensors averaged over each floor or zone AvgRH : Mean of all relative-humidity readings averaged over each floor or zone AvgTemp : Mean of all temperature sensor readings averaged over each floor or zone B : Number of trees in the ensemble (bagging: number of bootstrap replicates; boosting: number of sequential learners) 𝜷𝟎 : Constant term in the linear regression model 𝜷𝒎 : m-th coefficient in the linear regression model 𝒄𝒎 : Constant predicted value within the m-th terminal region of a decision tree 𝜺𝒊 : i-th noise term for arbitrary errors in the linear regression model 𝒇 : Scalar value function of independent variable 𝑋() in the linear regression model 𝒇((𝒙) : Boosting Predictor 𝒇(∗𝒃(𝒙) : Prediction from the 𝑏-th bootstrap-fitted tree 𝒇(𝒃(𝒙) : The tree fitted at iteration 𝑏 in boosted tree equation 𝒇(𝒃𝒂𝒈(𝒙) : Bagged predictor 𝒇(𝑿) : Decision Tree Regression model 𝒇𝒎(∙) : Basis (feature) function 𝑚 in a (generalized) linear regression model 𝒊 : Index of a training example, running from 1 up to 𝑛, where 𝑛 is the total number of observations 𝝀 : Shrinkage rate in Boosted Tree models 𝑴 : Number of terminal regions (leaves) in decision tree regression model xvi 𝒎 : Index over model components, the m-th predictor term in a linear model or the m-th region (leaf) in a decision tree regression model 𝒏 : Total number of observations OutdoorFf, 𝑭𝒇 : The mean wind speed recorded at a height of 10–12 meters over a 10-minute average OutdoorP, 𝑷 : Atmospheric pressure adjusted to mean sea level OutdoorP0, 𝑷𝟎 : Atmospheric pressure at the weather station level OutdoorT, 𝑻 : Air temperature at 2 meters above the surface OutdoorTd, 𝑻𝒅 : Dew-point temperature at 2 meters OutdoorU, 𝑼 : Relative humidity measured at 2 meters 𝑹𝟐 : Coefficient of Determination 𝑹𝒎 : m-th region (partition) of feature space in a decision tree regression model 𝚺 : Summation operatör 𝑺𝑺𝒓𝒆𝒔 : Sum of squares of residuals 𝑺𝑺𝒕𝒐𝒕 : Total sum of squares TotalAC : Sum of all air-conditioning power consumption across all zones on the floor or zone TotalLighting : Sum of all lighting power consumption across all zones on the floor or zone TotalPlug : Sum of all plug-load power consumption across all zones on the floor or zone 𝑿 : A random p-dimensional feature vector, i.e. the set of all possible inputs to the model (feature space) in decision regression model 𝒙 : Generic feature‐vector at which a model is evaluated 𝑿𝒊 : i-th observed feature vector in the training data in linear regression model 𝑿𝒊𝒋 : i-th observations on the j-th predictor variable (𝑗 = 1,2, … , 𝑝) in linear regression model 𝒚F𝒊 : The predicted value in MSE, RMSE, MAE equations 𝒚𝒊 : Observed target (response) for the i-th training case in linear regression model or the actual value in MSE, RMSE, MAE equations z : Zone identifier z1 : Zone 1 (first zone on each floor) z2 : Zone 2 (second zone on each floor) z3 : Zone 3 (third zone on each floor) xvii z4 : Zone 4 (fourth zone on each floor) z5 : Zone 5 (fifth zone on floors 3-7) xviii xix LIST OF TABLES Page Indoor predictors. .................................................................................... 35 Outdoor predictors. ................................................................................. 37 Table 3.1 : Floor 1 Linear Regression models evaluation metrics. ........................... 83 Table 3.2 : Floor 2 Linear Regression models evaluation metrics. ........................... 84 Table 3.3 : Floor 3 Linear Regression models evaluation metrics. ........................... 86 Table 3.4 : Floor 4 Linear Regression models evaluation metrics. ........................... 88 Table 3.5 : Floor 5 Linear Regression models evaluation metrics. ........................... 90 Table 3.6 : Floor 6 Linear Regression models evaluation metrics. ........................... 92 Table 3.7 : Floor 7 Linear Regression models evaluation metrics. ........................... 94 Table 3.8 : Evaluation metrics of Linear Regression models. .................................. 97 Table 3.9 : Floor 1 Decision Tree and Ensemle models evaluation metrics. ............ 98 Table 3.10 : Floor 2 Decision Tree and Ensemle models evaluation metrics. ........ 100 Table 3.11 : Floor 3 Decision Tree and Ensemle models evaluation metrics. ........ 102 Table 3.12 : Floor 4 Decision Tree and Ensemle models evaluation metrics. ........ 104 Table 3.13 : Floor 5 Decision Tree and Ensemle models evaluation metrics. ........ 106 Table 3.14 : Floor 6 Decision Tree and Ensemle models evaluation metrics. ........ 108 Table 3.15 : Floor 7 Decision Tree and Ensemle models evaluation metrics. ........ 110 Table 3.16 : Evaluation metrics of Decision Tree and Ensemble models. .............. 113 Table 3.17 : Floor 1 total average Bagged Tree model evaluation metrics. ........... 114 Table 3.18 : Floor 1 zonal average Bagged Tree model evaluation metrics. .......... 115 Table 3.19 : Floor 2 total average Bagged Tree model evaluation metrics. ........... 117 Table 3.20 : Floor 2 zonal average Bagged Tree model evaluation metrics. .......... 117 Table 3.21 : Floor 3 total average Bagged Tree model evaluation metrics. ........... 119 Table 3.22 : Floor 3 zonal average Bagged Tree model evaluation metrics. .......... 119 Table 3.23 : Floor 4 total average Bagged Tree model evaluation metrics. ........... 121 Table 3.24 : Floor 4 zonal average Bagged Tree model evaluation metrics. .......... 121 Table 3.25 : Floor 5 total average Bagged Tree model evaluation metrics. ........... 123 Table 3.26 : Floor 5 zonal average Bagged Tree model evaluation metrics. .......... 124 Table 3.27 : Floor 6 total average Bagged Tree model evaluation metrics. ........... 126 Table 3.28 : Floor 6 zonal average Bagged Tree model evaluation metrics. .......... 126 Table 3.29 : Floor 7 total average Bagged Tree model evaluation metrics. ........... 128 Table 3.30 : Floor 7 zonal average Bagged Tree model evaluation metrics. .......... 129 Table 3.31 : Bagged Tree model - RMSE and 𝑅2 values for total average and zonal average methods for all floors and zones. ........................................................ 132 xx xxi LIST OF FIGURES Page Figure 1.1 : Taxonomy of machine‑learning methods highlighting supervised learning, unsupervised learning, and deep learning and their representative algorithm families. Note: Reprinted from Tien et al. (2022, Fig. 1a). ............... 18 3D visualization of the seven-story academic office building (Pipattanasomporn et al., 2020a). ............................................................................... 28 Floor plans on Floors 1–2 (left) and Floors 3–7 (right) (Pipattanasomporn et al., 2020a). ....................................................................... 29 Flowchart for Linear-Regression and Decision-Tree models. .............. 39 Flowchart for evaluating Bagged-Tree ensemble model at floor and zone levels. ......................................................................................................... 40 Figure 3.1 : a) Floor 1 Total AC over 2019, b) Floor 1 TotalLighting over 2019, c) Floor 1 TotalPlug over 2019. ................................................................. 44 Figure 3.2 : Floor 2 correlation matrix. ..................................................................... 45 Figure 3.3 : a) Floor 2 TotalAC over 2019, b) Floor 2 AvgLux over 2019, c) Floor 2 TotalLighting over 2019, d) Floor 2 AvgRH over 2019, e) Floor 2 TotalPlug over 2019, f) Floor 2 AvgTemp over 2019. ....................................................... 46 Figure 3.4 : Floor 3 correlation matrix. ..................................................................... 47 Figure 3.5 : a) Floor 3 TotalAC over 2019, b) Floor 3 AvgLux over 2019, c) Floor 3 TotalLighting over 2019, d) Floor 3 AvgRH over 2019, e) Floor 3 TotalPlug over 2019, f) Floor 3 AvgTemp over 2019. ....................................................... 48 Figure 3.6 : Floor 4 correlation matrix. ..................................................................... 49 Figure 3.7 : a) Floor 4 TotalAC over 2019, b) Floor 4 AvgLux over 2019, c) Floor 4 TotalLighting over 2019, d) Floor 4 AvgRH over 2019, e) Floor 4 TotalPlug over 2019, f) Floor 4 AvgTemp over 2019. ....................................................... 50 Figure 3.8 : Floor 5 correlation matrix. ..................................................................... 51 Figure 3.9 : a) Floor 5 TotalAC over 2019, b) Floor 5 AvgLux over 2019, c) Floor 5 TotalLighting over 2019, d) Floor 5 AvgRH over 2019, e) Floor 5 TotalPlug over 2019, f) Floor 5 AvgTemp over 2019. ....................................................... 52 Figure 3.10 : Floor 6 correlation matrix. ................................................................... 53 Figure 3.11 : a) Floor 6 TotalAC over 2019, b) Floor 6 AvgLux over 2019, c) Floor 6 TotalLighting over 2019, d) Floor 6 AvgRH over 2019, e) Floor 6 TotalPlug over 2019, f) Floor 6 AvgTemp over 2019. ....................................................... 54 Figure 3.12 : Floor 7 correlation matrix. ................................................................... 55 Figure 3.13 : a) Floor 7 TotalAC over 2019, b) Floor 7 AvgLux over 2019, c) Floor 7 TotalLighting over 2019, d) Floor 7 AvgRH over 2019, e) Floor 7 TotalPlug over 2019, f) Floor 7 AvgTemp over 2019. ....................................................... 56 Figure 3.14 : Floors combined correlation matrix. ................................................... 57 Figure 3.15 : a) Floors combined TotalAC over 2019, b) Floors combined AvgLux over 2019, c) Floors combined TotalLighting over 2019, d) Floors combined AvgRH over 2019, e) Floors combined TotalPlug over 2019, f) Floors combined AvgTemp over 2019. ......................................................................... 58 Figure 3.16 : Floor 1 – 15 January and 15 April (AC Power Consumption). ........... 59 xxii Figure 3.17 : Floor 1 – 15 January and 15 April (Lighting, Plug, and Total Power Consumption). .................................................................................................... 60 Figure 3.18 : Floor 1 – 15 July and 15 October (AC, Lighting, Plug, and Total Power Consumption). ......................................................................................... 61 Figure 3.19 : Floor 2 – 15 January and 15 April (AC and Lighting Power Consumption). .................................................................................................... 62 Figure 3.20 : Floor 2 – 15 January and 15 April (Plug and Total Power Consumption). .................................................................................................... 63 Figure 3.21 : Floor 2 – 15 July and 15 October (AC Power Consumption). ............ 63 Figure 3.22 : Floor 2 – 15 July and 15 October (Lighting, Plug, and Total Power Consumption). .................................................................................................... 64 Figure 3.23 : Floor 3 – 15 January and 15 April (AC Power Consumption). ........... 65 Figure 3.24 : Floor 3 – 15 January and 15 April (Lighting, Plug, and Total Power Consumption). .................................................................................................... 66 Figure 3.25 : Floor 3 – 15 July and 15 October (AC, Lighting, Plug Power Consumption). .................................................................................................... 67 Figure 3.26 : Floor 3 – 15 July and 15 October (Total Power Consumption). ......... 68 Figure 3.27 : Floor 4 – 15 January and 15 April (AC, Lighting, Plug Power Consumption). .................................................................................................... 69 Figure 3.28 : Floor 4 – 15 January and 15 April (Total Power Consumption). ........ 70 Figure 3.29 : Floor 4 – 15 July and 15 October (AC and Lighting Power Consumption). .................................................................................................... 70 Figure 3.30 : Floor 4 – 15 July and 15 October (Plug and Total Power Consumption). .................................................................................................... 71 Figure 3.31 : Floor 5 – 15 January and 15 April (AC, Lighting, Plug Power Consumption). .................................................................................................... 72 Figure 3.32 : Floor 5 – 15 January and 15 April (Total Power Consumption). ........ 73 Figure 3.33 : Floor 5 – 15 July and 15 October (AC, Lighting Power Consumption). ............................................................................................................................ 73 Figure 3.34 : Floor 5 – 15 July and 15 October (Plug and Total Power Consumption). .................................................................................................... 74 Figure 3.35 : Floor 6 – 15 January and 15 April (AC, Lighting, Plug Power Consumption). .................................................................................................... 75 Figure 3.36 : Floor 6 – 15 January and 15 April (Total Powe Consumptions). ........ 76 Figure 3.37 : Floor 6 – 15 July and 15 October (AC, Lighting Power Consumption). ............................................................................................................................ 76 Figure 3.38 : Floor 6 – 15 July and 15 October (Plug and Total Power Consumption). .................................................................................................... 77 Figure 3.39 : Floor 7 – 15 January and 15 April (AC, Lighting, Plug Power Consumption). .................................................................................................... 78 Figure 3.40 : Floor 7 – 15 January and 15 April (Total Power Consumption). ........ 79 Figure 3.41 : Floor 7 – 15 July and 15 October (AC, Lighting Power Consumption). ............................................................................................................................ 79 Figure 3.42 : Floor 7 – 15 July and 15 October (Plug and Total Power Consumption). .................................................................................................... 80 Figure 3.43 : Floor 5 – 18, 23, 24 March 2019 Total Power Consumption. ............. 81 Figure 3.44 : Floor 1 Interations Model Actual vs. Predicted graph. ........................ 83 Figure 3.45 : Floor 1 Interations Model Residuals. .................................................. 83 Figure 3.46 : Floor 2 Interations Model Actual vs. Predicted graph. ........................ 85 Figure 3.47 : Floor 2 Interations Model Residuals. .................................................. 85 xxiii Figure 3.48 : Floor 3 Interations Model Actual vs. Predicted graph. ....................... 87 Figure 3.49 : Floor 3 Interations Model Residuals. .................................................. 87 Figure 3.50 : Floor 4 Interations Model Actual vs. Predicted graph. ....................... 89 Figure 3.51 : Floor 4 Interations Model Residuals. .................................................. 89 Figure 3.52 : Floor 5 Interations Model Actual vs. Predicted graph. ....................... 91 Figure 3.53 : Floor 5 Interations Model Residuals. .................................................. 91 Figure 3.54 : Floor 6 Interations Model Actual vs. Predicted graph. ....................... 93 Figure 3.55 : Floor 6 Interations Model Residuals. .................................................. 93 Figure 3.56 : Floor 7 Interations Model Actual vs. Predicted graph. ....................... 95 Figure 3.57 : Floor 7 Interations Model Residuals. .................................................. 95 Figure 3.58 : Floor 1 Bagged Tree Actual vs. Predicted graph. ............................... 99 Figure 3.59 : Floor 1 Bagged Tree Residuals. .......................................................... 99 Figure 3.60 : Floor 2 Bagged Tree Actual vs. Predicted graph. ............................. 101 Figure 3.61 : Floor 2 Bagged Tree Residuals. ........................................................ 101 Figure 3.62 : Floor 3 Fine Tree Actual vs. Predicted graph. ................................... 103 Figure 3.63 : Floor 3 Fine Tree Residuals. ............................................................. 103 Figure 3.64 : Floor 4 Bagged Tree Actual vs. Predicted graph. ............................. 105 Figure 3.65 : Floor 4 Bagged Tree Residuals. ........................................................ 105 Figure 3.66 : Floor 5 Fine Tree Actual vs. Predicted graph. ................................... 107 Figure 3.67 : Floor 5 Fine Tree Residuals. ............................................................. 107 Figure 3.68 : Floor 6 Bagged Tree Actual vs. Predicted graph. ............................. 109 Figure 3.69 : Floor 6 Bagged Tree Residuals. ........................................................ 109 Figure 3.70 : Floor 7 Fine Tree Actual vs. Predicted graph. ................................... 111 Figure 3.71 : Floor 7 Fine Tree Residuals. ............................................................. 111 Figure 3.72 : Floor 1 Bagged Tree total average graphs: a) Zone 1 Actual vs. Predicted, b) Zone 1 Residuals, c) Zone 2 Actual vs. Predicted, d) Zone 2 Residuals. ......................................................................................................... 115 Figure 3.73 : Floor 1 Bagged Tree total average graphs: a) Zone 3 Actual vs. Predicted, b) Zone 3 Residuals, c) Zone 4 Actual vs. Predicted, d) Zone 4 Residuals. ......................................................................................................... 116 Figure 3.74 : Floor 2 Bagged Tree total average graphs: a) Zone 1 Actual vs. Predicted, b) Zone 1 Residuals, c) Zone 2 Actual vs. Predicted, d) Zone 2 Residuals. ......................................................................................................... 117 Figure 3.75 : Floor 2 Bagged Tree total average graphs: a) Zone 3 Actual vs. Predicted, b) Zone 3 Residuals, c) Zone 4 Actual vs. Predicted, d) Zone 4 Residuals. ......................................................................................................... 118 Figure 3.76 : Floor 3 Bagged Tree total average graphs: a) Zone 1 Actual vs. Predicted, b) Zone 1 Residuals, c) Zone 2 Actual vs. Predicted, d) Zone 2 Residuals. ......................................................................................................... 119 Figure 3.77 : Floor 3 Bagged Tree total average graphs: a) Zone 3 Actual vs. Predicted, b) Zone 3 Residuals, c) Zone 4 Actual vs. Predicted, d) Zone 4 Residuals, e) Zone 5 Actual vs. Predicted, f) Zone 5 Residuals. ..................... 120 Figure 3.78 : Floor 4 Bagged Tree total average graphs: a) Zone 1 Actual vs. Predicted, b) Zone 1 Residuals. ........................................................................ 121 Figure 3.79 : Floor 4 Bagged Tree total average graphs: a) Zone 2 Actual vs. Predicted, b) Zone 2 Residuals, c) Zone 3 Actual vs. Predicted, d) Zone 3 Residuals, e) Zone 4 Actual vs. Predicted, f) Zone 4 Residuals. ..................... 122 Figure 3.80 : Floor 4 Bagged Tree total average graphs: a) Zone 5 Actual vs. Predicted, b) Zone 5 Residuals. ........................................................................ 123 xxiv Figure 3.81 : Floor 5 Bagged Tree total average graphs: a) Zone 1 Actual vs. Predicted, b) Zone 1 Residuals, c) Zone 2 Actual vs. Predicted, d) Zone 2 Residuals. ......................................................................................................... 124 Figure 3.82 : Floor 5 Bagged Tree total average graphs: a) Zone 3 Actual vs. Predicted, b) Zone 3 Residuals, c) Zone 4 Actual vs. Predicted, d) Zone 4 Residuals, e) Zone 5 Actual vs. Predicted, f) Zone 5 Residuals. ..................... 125 Figure 3.83 : Floor 6 Bagged Tree total average graphs: a) Zone 1 Actual vs. Predicted, b) Zone 1 Residuals. ........................................................................ 126 Figure 3.84 : Floor 6 Bagged Tree total average graphs: a) Zone 2 Actual vs. Predicted, b) Zone 2 Residuals, c) Zone 3 Actual vs. Predicted, d) Zone 3 Residuals, e) Zone 4 Actual vs. Predicted, f) Zone 4 Residuals. ..................... 127 Figure 3.85 : Floor 6 Bagged Tree total average graphs: a) Zone 5 Actual vs. Predicted, b) Zone 5 Residuals. ........................................................................ 128 Figure 3.86 : Floor 5 Bagged Tree total average graphs: a) Zone 1 Actual vs. Predicted, b) Zone 1 Residuals, c) Zone 2 Actual vs. Predicted, d) Zone 2 Residuals, e) Zone 3 Actual vs. Predicted, f) Zone 3 Residuals. ..................... 129 Figure 3.87 : Floor 7 Bagged Tree total average graphs: a) Zone 4 Actual vs. Predicted, b) Zone 4 Residuals, c) Zone 5 Actual vs. Predicted, d) Zone 5 Residuals. ......................................................................................................... 130 xxv DATA-DRIVEN PREDICTION OF LIGHTING ENERGY CONSUMPTION IN TERTIARY BUILDINGS USING MACHINE LEARNING SUMMARY This study examines the implementation of advanced machine learning (ML) methods to predict energy consumption within tertiary buildings. The main objective arises from the increasing demand to decrease the maintenance and operational costs and environmental effects without contracting indoor environmental conditions. To be more specific, research focuses on the issue of lighting energy prediction in office environments in which current controlling mechanisms mostly depend on fixed schedules or basic occupancy sensors and thus leading to wasteful energy use. Accurate prediction of lighting demand, moreover, empowers adaptive dimming and occupancy-aware scheduling that trim operating costs, while model-predictive control can pre-emptively curtail or shift lighting loads during utility peak periods, giving tertiary buildings a cost-effective path to join demand-response programs and bolster grid flexibility. The research integrates high-resolution indoor measurements of lighting, air- conditioning (AC), and plug loads with indoor environmental measurements and outdoor meteorological data. Primary objective is on the complete 2019 dataset from the Chulalongkorn University Building Energy Management System (CU-BEMS). This dataset recorded at one-minute intervals in the seven-story academic office building and it is open-source - publicly released by the original researchers - was not collected by the author of this thesis but utilized as-is for analysis. Dataset collects comprehensive, minute-level changes in electricity consumption, including air conditioning, lighting, and plug loads, as well as indoor environmental parameters such as temperature, relative humidity, and ambient light. A carefully designed data preprocessing routine reduces sensor errors and fills missing values which secures reliable input features for model development. Different machine learning methods such as linear regressions, decision trees, and ensemble learning models are evaluated using MATLAB’s Regression Learner tools. Special focus is allocated on Bagged Trees algorithms, as these display remarkable accuracy in modeling the complex, nonlinear relationships that determine lighting energy usage patterns. Major findings expose that seasonality applies a noticeable effect on the whole consumption, with cooling demands increasing substantially during warmer months and lighting demands remaining quite consistent throughout the year. Occupant-driven elements also contour consumption, as indicated by spikes in lighting and plug loads during peak usage hours. Among the different machine learning models applied – ranging from traditional Ordinary Least Squares (OLS) regression and interaction- based linear methods to advanced tree-based models such as Fine, Medium, and Coarse Trees, as well as Boosted Trees - Bagged Trees turned up as the most powerful and precise approach. While linear regression models offered interpretability and computational integrity, they are not perfect when it comes to identifying the nonlinear xxvi dynamics of energy use. In addition, standalone decision tree models provided preliminary perception but depending on their complexity level they were susceptible to overfitting or underfitting. Boosted Trees demonstrated improved accuracy by iteratively correcting errors but required deliberate tuning to avoid instability. On the other hand, the Bagged Trees approach balanced prediction accuracy and stability by aggregating results from multiple randomized decision trees, effectively grabbing temporal and spatial nuances in the data. It persistently outperformed other methods in terms of predictive accuracy and generalization, both at the whole-building and zone-specific levels. To sum up, the thesis illustrates that ML-based approaches can precisely predict the lighting energy consumption in tertiary buildings. These results confirm the initial hypothesis that machine learning methods, particularly ensemble-tree models, may have a significant effect on reducing energy consumption while maintaining indoor environmental conditions at desired levels. From a practical standpoint, the study suggests compelling evidence for deploying ML algorithms in smart lighting systems. Future extensions may include longer-term field deployments, integration with other building subsystems such as HVAC, and the exploration of advanced sensor networks to further improve the efficiency and adaptability of energy management solutions which adopted ML models. xxvii HİZMET SEKTÖRÜ BİNALARINDA AYDINLATMA ENERJİ TÜKETİMİNİN MAKİNE ÖĞRENMESİ İLE VERİ ODAKLI TAHMİNİ ÖZET Bu tez, hlzmet sektörü blnalarında aydınlatma slstemlerlnln enerjl verlmllllğlnl artırmaya yönellk olarak gellşmlş maklne öğrenmesl teknlklerlnln uygulanmasını lncelemektedlr. Araştırmanın temel motlvasyonu, aydınlatma yüklerlnln zamansal ve mekânsal değlşkenllğlnl nlcellksel olarak bellrlemek ve bu dlnamlklerln daha lyl anlaşılmasına katkıda bulunmaktır. Araştırmanın temel hlpotezl, yüksek çözünürlüklü lç mekân verllerlnln (daklkalık ölçümler) ve bölgesel meteorolojlk verllerln entegrasyonu yoluyla oluşturulan maklne öğrenmesl modellerlnln, sablt varsayımlara dayanan geleneksel tahmln yaklaşımlarına kıyasla çok daha düşük hata oranlarıyla aydınlatma güç tüketlmlnl öngöreblleceğldlr. Bu bağlamda çalışma, mevslmsel değlşlmlerl ve çevresel faktörlerl ayrıntılı olarak lnceleyerek bunları tüketlm tahmln modellne entegre etmektedlr. Konu hem akademlk hem de endüstrlyel açıdan önemlldlr; zlra yüksek lsabetll öngörüler, enerjl yönetlmlyle llglll karar verme süreçlerlnde bellrslzllğl azaltarak kaynak planlamasının güvenlllrllğlnl artırmaktadır. Tez kapsamında, araştırma sürecl öncellkle 2019 yılına alt CU-BEMS verl setlnln anallz edllmeslyle başlamıştır. Bangkok, Tayland’dakl Chulalongkorn Ünlversltesl’ne alt yedl katlı blr akademlk ofls blnasında blr daklkalık aralıklarla kaydedllen bu açık erlşlmll verl setl, yazar tarafından toplanmamış olup, araştırmacılar tarafından yayımlanan hallyle kullanılmıştır. Verl setl, blna lçerlslndekl 33 farklı bölgeden elde edllen lkllmlendlrme, aydınlatma ve prlz yüklerlne lllşkln elektrlk tüketlmlerlnln yanı sıra ortam sıcaklığı, bağıl nem ve aydınlık düzeyl glbl blna lçl çevresel değlşkenlerdekl daklkalık değlşlmlerl kapsamaktadır. Ayrıca, dış hava verllerl olarak Bangkok Suvarnabhuml Havallmanı’ndan elde edllen, 30 daklkalık aralıklarla kaydedllen meteorolojlk verller (sıcaklık, atmosfer basıncı, bağıl nem vb.) kullanılmaktadır. Bu lkl farklı verl setlnln entegrasyonu, blna lçl ve dış çevre koşullarının enerjl tahmln modellne dahll edllerek kapsamı genlşletmesl açısından önemlldlr. Verl ön lşleme aşamasında, sensör verllerlnde oluşan ekslkllkler, hatalı okuma sorunları ve boş değerler; doğrusal enterpolasyon, hareketll medyan flltreleme ve sensör bakım dönemlerlndekl global bölge medyanı ataması lle glderllmlştlr. Böylellkle verller, flzlksel gerçekçlllğl korunarak modellemeye elverlşll hâle getlrllmlştlr. Analltlk süreçte MATLAB R2024b Update 5 kullanılarak çeşltll regresyon modellerl gellştlrllmlştlr. İlk aşamada Regresslon Learner aracıyla doğrusal regresyon, destek vektör maklnelerl, yapay slnlr ağları, Gauss süreç regresyonu ve karar ağaçları değerlendlrllmlştlr. Llneer Regresyon yaklaşımları (Sıradan En Küçük Kareler, Etklleşlmll Regresyon, Aşamalı Regresyon, Dayanıklı Regresyon, Verlmll En Küçük Kareler) hesaplama mallyetl düşük ve yorumlanablllr olsalar da doğrusal olmayan lllşkllerde sınırlı kalmıştır. Karar ağacı tabanlı yaklaşımlar (Genlş, Orta ve xxviii Dar Karar Ağaçları) örüntü yakalamada başarılı olsa da zaman zaman düşük genelleme göstermlştlr. Boosted Karar Ağacı modell lteratlf hata düzeltmeslyle başarı sağlamış; ancak genel değerlendlrmede Bagglng Karar Ağaçları hem doğruluk hem de hesaplama verlmllllğl bakımından üstün bulunmuştur. Modelleme sürecl, blna genell ve kat/bölge düzeylnde ayrı ayrı yürütülmüş; her bölgenln özgün tüketlm eğlllmlerl tltlzllkle yakalanmıştır. Araştırmanın yenlllkçl yönlerl arasında, lç mekân ve dış hava verllerlnln senkronlze edllerek yüksek çözünürlüklü blrleşlk blr verl setl oluşturulması, verl temlzleme aşamasında uygulanan global medyan stratejlsl ve kat bazında ayrı modellerln gellştlrllmesl yer almaktadır. Bu yöntemsel yaklaşım, maklne öğrenmesl modellerlnln farklı kullanım senaryolarına uyarlanablllrllğlnl ön plana çıkarmıştır. Bagglng Karar Ağaçları yaklaşımı da llteratürde nadlr rastlanan ayrıntılı blr uygulama olarak öne çıkmaktadır. Gellştlrllen modellerln uygulanması sonucunda elde edllen temel bulgular, aydınlatma güç tüketlmlnln gün lçlndekl ve sezonluk değlşlkllklerlnl ve dış çevresel faktörler tarafından bellrgln şekllde etkllendlğlnl ortaya koymaktadır. İkllmlendlrme yüklerlnln yaz aylarında kış aylarına göre lkl katına varan artış serglledlğl gözlemlenlrken, aydınlatma tüketlmlnln yıl boyunca daha kararlı seyrettlğl; ancak gün ışığının azaldığı kış aylarında haflf artışlar gösterdlğl bellrlenmlştlr. Bagglng Karar Ağaçları algorltması, blna lçl güç tüketlml verllerlnln zaman lçlndekl dalgalanmalarını ve bölgesel farklılıklarını yüksek blr doğrulukla modelleyebllmlştlr. Kat bazındakl lncelemeler, yoğun çalışma saatlerlnde aydınlatma ve prlz yüklerlnde keskln artışlar, gün ortasında lse bellrgln blr düşüş yaşandığını göstermlştlr. Bu durum, sensör verllerlnln özenll ön lşleme tabl tutulmasının ve doğru algorltma seçlmlnln, karmaşık ve doğrusal olmayan lllşkllerln yakalanmasındakl krltlk rolünü vurgulamaktadır. Ayrıca dış hava verllerlnln (sıcaklık, nem vb.) lç mekân güç tüketlml üzerlndekl etkllerl de anlamlı düzeyde ortaya konmuştur. Dış ortam sıcaklığı yükseldlğlnde blna lçl kllma kullanımının bellrgln şekllde arttığı ve buna paralel olarak toplam güç tüketlmlnde artış gözlendlğl saptanmıştır. Bu bulgu, çevresel glrdllerln tüketlm kestlrlm çalışmalarında mutlaka dlkkate alınması gerektlğlnl göstermektedlr. Bu tez çalışması, hlzmet amaçlı blnalarda aydınlatma enerjl tüketlml tahmln modellerlnln yüksek çözünürlüklü sensör verllerl lle bölgesel meteorolojlk verllerln entegrasyonu sayeslnde yüksek lsabetle kurulablleceğlnl kanıtlamıştır. Bagglng Karar Ağaçları glbl Topluluk Öğrenme Yöntemlerln, doğrusal modellere kıyasla daha düşük tahmln hatası ve esnekllk sağladığı bellrlenmlştlr. Tez, sensör verllerlnde ortaya çıkan ekslk veya hatalı okumaların uygun verl temlzleme ve ön lşleme teknlklerlyle glderllmeslnln model performansını optlmlze ettlğlnl göstermlştlr. Özelllkle yılın llk aylarındakl sensör bakım kaynaklı verl kayıplarının global medyan atamasıyla telafl edllmesl, tahmln hatasını önemll ölçüde düşürmüştür. Aydınlatma güç tüketlmlnln lsabetll şekllde öngörülebllmesl, gelecekte enerjl yönetlml açısından çeşltll uygulama olanakları sunablllr. Doğru tahmlnler, llerlde kullanım alışkanlıklarına göre uyarlanablllr kısma ve zamanlama glbl dlnamlk kontrol stratejllerlnln gellştlrllmeslne zemln hazırlayablllr. Bu sayede, enerjl verlmllllğl artırılablllr ve verl odaklı kaynak planlaması lle lşletme mallyetlerl optlmlze edlleblllr. xxix Ayrıca, gelecekte bu tür tahmln modellerlnln kestlrlm tabanlı kontrol slstemlerlyle bütünleştlrllmesl, blnaların talep yanıt programlarına daha etkln blçlmde katılımını mümkün kılablllr; böylece plk saatler önceslnde yük yönetlml yapılarak şebeke esnekllğl artırılablllr. Uygulamada, elde edllen yüksek çözünürlüklü tüketlm proflllerl, blna yönetlcllerlne elektrlk talep planlaması, bütçe tahslsl ve bakım zamanlaması glbl konularda nlcel öngörüler sunmaktadır. Örneğln, yoğunluk proflllerlnln ortaya koyduğu tepe saat dlllmlerl, bakım programlarının tüketlm dışı saatlere kaydırılmasına veya gerlllm dengeleme stratejllerlnln lylleştlrllmeslne yardımcı olablllr. Dış hava verllerlnln dahll edllmesl lse mevslmsel talep değlşkenllğlnln önceden saptanmasını kolaylaştırmaktadır. Bununla blrllkte, çalışmanın bazı sınırlılıkları mevcuttur. Öncellkle, kullanılan dış hava verllerl Bangkok Suvarnabhuml Havallmanı’ndan sağlanmış olup, blnanın bulunduğu konumdakl mlkrokllmatlk koşulları tam olarak yansıtmayablllr. Sensör altyapısında zaman zaman yaşanan kallbrasyon sorunları ve verl keslntllerl, özelllkle başlangıç dönemlnde, tahmlnlerln doğruluğunu kısmen etkllemlştlr. Ayrıca, tüm aydınlık (lux) ölçümlerl tavana monte edllmlş sensörlerden elde edllmlştlr; farklı konum veya yüksekllklerde yapılacak ölçümler farklı ışık alanlarını yansıtacağından, bu durum model katsayılarını da değlştlreblllr. Çalışmanın geçerllllğl, anallz edllen blnanın tasarımına, kullanım alışkanlıklarına ve lkllmsel bağlamına özgü olduğundan, elde edllen bulguların genelleneblllrllğl sınırlı kalmaktadır. Son olarak, MATLAB Regresslon Learner aracılığıyla yapılan llk model taramalarında yalnızca beş katlı çapraz doğrulama uygulanmış, sonrakl özel betlk çalışmalarında lse bağımsız blr doğrulama kümesl kullanılmamıştır. Sonuç olarak, bu tez, maklne öğrenmesl tabanlı lstatlstlksel modellerln, hlzmet sektörü blnalarında aydınlatma güç tüketlm örüntülerlnl yüksek doğrulukla tahmln edeblleceğlnl ortaya koymuştur. Çalışmanın blllmsel ve pratlk katkıları, tüketlm proflllnln zamansal/mekânsal çözünürlükte modellenmesl, verl‐odaklı karar destek araçlarının gellştlrllmeslne olanak sağlaması ve gelecektekl araştırmalara sağlam blr metodolojlk temel sunması bakımından değerlldlr. Bu tez kapsamında elde edllen bulgular, akıllı blna analltlğl alanında önemll blr temel teşkll etmektedlr. Gelecekte, bu çalışmadan yola çıkarak daha genlş verl setlerlyle yürütülecek uzun sürell saha uygulamaları, gellştlrllen modellerln farklı blna türlerlnde genelleneblllrllğlnl test etmek adına değerll olablllr. Uzun vadell saha uygulamaları, HVAC glbl dlğer blna alt slstemlerlyle entegrasyon ve llerl düzey sensör ağlarının araştırılması, maklne öğrenlml modellerlyle desteklenen enerjl yönetlm çözümlerlnln verlmllllğlnl ve uyarlanablllrllğlnl artırarak bu alandakl araştırmalara yenl açılımlar kazandırablllr. xxx 1 INTRODUCTION The growing adoption of machine-learning techniques has placed data-driven prediction of lighting energy consumption at the forefront of research on tertiary buildings. Within this paradigm, the thesis investigates lighting-related forecasts, as outlined in the subsequent purpose statement. Purpose of Thesis The primary objective of this thesis is to explore advanced Machine Learning (ML) algorithms for optimizing lighting energy usage in tertiary buildings through energy consumption prediction. Practical and academic imperatives are the essential motives behind this research. Practically, accurate lighting energy consumption prediction can give a solid clue about future strategies for operational costs reduction and environmental impact minimization, thereby aiding immense sustainability objectives in building operations. Academically, this thesis contributes to the field by implementing a range of machine learning approaches such as linear regression models, decision tree and ensemble learning models to high-resolution building datasets. The focal point is on analyzing the predictive capabilities of these models and examining how well they capture patterns in energy consumption associated with occupancy, environmental conditions, and temporal variations. This implementation- driven approach offers understanding into the propriety of different machine learning methods for modeling lighting energy demand in smart building environments. Accurate prediction of lighting energy consumption in tertiary buildings confers multiple practical advantages for energy management. Consumption forecasts enable dynamic control strategies – such as adaptive dimming and scheduling based on expected occupancy patterns – that enhance energy efficiency and reduce operational costs through data-driven resource allocation (Norouziasl & Jafari, 2023; Liu et al., 2023). Furthermore, integrating these forecasts into model predictive control frameworks facilitates participation in demand response programs by pre-emptively 2 curtailing or shifting lighting loads during peak periods, thereby supporting grid flexibility and minimizing peak demand charges (Kathirgamanathan et al., 2021). The thesis particularly attempts to create prediction models employing cutting-edge machine learning techniques, such as ensemble methods, decision trees, regression models, and reinforcement learning, and test these methods using an extensive real- world dataset. This was made attainable by combining two different sources into one new unified dataset. One dataset is the Chulalongkorn University Building Energy Management System (CU-BEMS), which provides fine-grained, minute-level measurements of lighting, AC, plug loads and indoor environmental conditions (ambient light, temperature, and humidity) in a seven-story university office building (Pipattanasomporn et al., 2020a). The second dataset is an outdoor weather dataset for Bangkok Suvarnabhumi (airport), which records meteorological parameters at 30-minute intervals (Raspisaniye Pogodi Ltd., 2025). For this dataset, variables including air temperature (𝑇), atmospheric pressure at weather station level (𝑃7), atmospheric pressure reduced to mean sea level (𝑃), relative humidity (𝑈), and dewpoint temperature (𝑇8) are used to capture relevant weather data that impact building indoor conditions and, indirectly, total energy use. This combined dataset method allows a robust investigation of how indoor and outdoor conditions influence lighting requirements and energy consumption. Introduction to Energy Efficiency in Lighting Systems In both economic and ecological perspectives, energy efficiency in tertiary buildings’ lighting systems has emerged as a crucial topic. In tertiary infrastructure, lighting accounts for a considerable amount of electricity usage, making it a key aspect of building energy use. Furthermore, research in the United States has shown that lighting accounts for a substantial portion of commercial building energy demand. One analysis found it to be about 17% of all electricity used in U.S. commercial buildings (Norouziasl & Jafari, 2022). In Europe, lighting represents roughly 14% of total electricity consumption, with the tertiary sector absorbing the largest share of that load (de Almeida et al., 2014). In Türkiye, the building sector (residential, commercial, and public) accounted for about one‐third of the country’s total final energy consumption in 2015, with electricity and gas together representing two‐thirds of that share (SHURA Energy 3 Transition Center, 2019). Onaygil et al. (2005) set out to characterize electrical energy use patterns in Turkish commercial buildings, with a special focus on the proportion attributable to lighting systems. They conducted a statistical inventory of energy use by analyzing monthly electricity bills from ten commercial buildings in Istanbul (categorized as offices, hotels, and performance venues) over the 2002–2004 period. As a result, they determined that lighting represented roughly 7 % of the total electricity consumption in hotels and about 19 % in office buildings. National data within the same study indicate that Türkiye’s total electricity consumption in 2004 was 116 561 GWh, of which commercial buildings accounted for 17 % of the country’s final electricity use. According to the 2000 “Building Census,” commercial buildings comprised only 7 % of all building stock but consumed 14.3 % of the sector’s electricity, whereas residences – making up 86 % of buildings – used just 23 % of total building electricity. Lighting is estimated to consume roughly one-fifth of all electricity globally. The United Nations Environment Programme (UNEP) reported that nearly 20% of the world’s electricity and about 6% of global greenhouse gas emissions were attributable to lighting as of the early 2010s (United Nations Environment Programme, 2013). This large energy footprint means that lighting efficiency improvement can yield considerable economic savings and emissions reductions. Owners and operators face high utility costs from lighting in commercial buildings. Inefficient systems translate into direct financial burdens. A landmark study by Mills (2002) highlighted the scale of this burden by estimating a $230 billion annual global lighting energy bill in the early 2000s, underlining the immense economic stakes in this sector. Moreover, inefficient lighting contributes unnecessarily to carbon emissions at a time when businesses are increasingly committed to sustainability goals. Reducing energy waste in lighting not only lowers operating costs but also supports environmental objectives such as carbon footprint reduction and compliance with green building standards. Equally important is the evolving role of data-driven approaches in managing and optimizing lighting energy use. Traditional lighting control methods such as fixed schedules or manual switching, often may lead to energy waste because lights may remain on during unoccupied periods or at higher levels than needed. However, in contrast modern data-driven energy management leverages sensors, analytics, and machine learning to adapt lighting usage to actual needs in real time. Intelligent 4 lighting systems can significantly curtail waste while maintaining comfort by collecting data on occupancy, daylight availability, and user preferences. One study has indicated that occupancy-driven lighting controls (such as motion sensors linked to lighting) can reduce lighting energy consumption by up to 30% compared to constant operation, without compromising occupant needs (Garg & Bansal, 2000). More advanced studies combining occupancy sensors with predictive algorithms and IoT data streams demonstrate even greater savings. Elkabalawy et al. (2024) report that using Wireless Fidelity (Wi-Fi)-based occupancy detection and smart algorithms can save 80–90% of lighting energy compared to conventional fixed schedules. These approaches ensure that lights are used only when and where needed, optimizing energy consumption in a way that static control strategies cannot. Overview of Global Energy Consumption Trends Lighting plays a substantial role in global energy consumption, and understanding its trends is crucial for contextualizing efficiency efforts. Historically, the share of electricity used for lighting worldwide has been very high. Lighting was estimated to account for approximately 20–30% of global electricity consumption (Ganandran et al., 2014). A UNEP report in 2013 similarly noted that about one-fifth of all electricity generated globally was used for lighting, which is an enormous fraction for a single end-use (United Nations Environment Programme, 2013). In absolute terms, this translates to thousands of terawatt-hours of electricity per year. Such high consumption has also meant that lighting was responsible for roughly 6% of global carbon dioxide emissions (United Nations Environment Programme, 2013). These patterns have made lighting an attractive target for energy-saving initiatives worldwide. In recent years, especially post-2020, there have been countervailing trends. According to the most recent International Energy Agency (IEA) research, electricity consumption for lighting grew in 2022 following years of improvement, as increased utilisation of lighting (especially in commercial and industrial applications as economies recovered from the pandemic) exceeded efficiency gains from newer technologies (IEA, 2023). The IEA reported a slight rise in carbon dioxide emissions from lighting in 2022, suggesting that in some regions and sectors, demand growth is overtaking efficiency progress (IEA, 2023). Major developing economies have played a key role in this increase. As wealth and urbanization grow, so does the demand for 5 well-illuminated buildings, streets, and services. Without corresponding efficiency improvements, overall energy use will continue to climb. This trend emphasizes that although lighting technology is progressing toward reduced energy intensity, the vast number of new lighting installations worldwide can still lead to higher consumption. It underscores the importance of ongoing advancements in efficiency and the widespread adoption of best practices in lighting design and management. Furthermore, there exists a difference between regions. Developed countries, particularly in North America and Europe, have typically experienced a stabilization or decline in lighting energy consumption due to the widespread adoption of efficient bulbs and strict efficiency regulations. In these areas, incandescent and even fluorescent lights have mostly been replaced with Light Emitting Diodes (LEDs), with advanced control systems becoming increasingly prevalent. Conversely, developing regions are undergoing a rapid expansion of lighting access, which includes bringing electricity to areas that previously had none and building new commercial spaces. In some instances, older, less efficient lighting technologies continue to be utilized or introduced because of their lower initial costs or slower implementation of policies. Initiatives such as the UNEP (2014) en.lighten program have aimed to speed up the global shift towards efficient lighting, highlighting that the worldwide phase-out of incandescent bulbs could result in substantial energy savings. Overall, global trends in lighting consumption exhibit high levels of energy use historically, which are now being moderated by advancements in efficiency. The interplay of various factors – such as technology, economic development, and policy – influences whether a specific region experiences an increase or a decrease in energy consumption for lighting. The emphasis on data-driven optimization, which is the core focus of this thesis, can significantly contribute to achieving further improvements even as the number of lighting fixtures continues to rise worldwide. Lighting’s Share in Building Energy Use Lighting energy trends are important because of their impact on overall building energy usage. Commercial buildings use energy for a variety of purposes (heating, cooling, equipment, and so on), with lighting traditionally being one of the most significant electricity end-uses in these buildings. Prior to broad adoption of efficient lighting, lighting was the single largest consumer of electrical energy in commercial 6 buildings in the United States, accounting for an incredible 38% of total commercial electricity usage (EIA, 2017). This means nearly two-fifths of all electricity in that sector went just to keeping lights on. By 2012, this share had dropped dramatically to about 17%, as a result of aggressive efficiency improvements and lighting upgrades over that decade (EIA, 2017). The sharp reduction, with lighting's contribution more than half, demonstrates how significant the switch to more efficient lighting has been. Many federal agencies and electric utilities have prioritized boosting lighting efficiency in the United States. At the national level, two landmark laws were enacted during the interval between the 2003 and 2012 Commercial Buildings Energy Consumption Survey (CBECS) surveys: the Energy Policy Act (EPAct) of 2005 and the Energy Independence and Security Act (EISA) of 2007 (EIA, 2017). In the European Union, non‐residential buildings consumed over 160 TWh of electricity for lighting in 2011 – roughly 40 % of that sector’s total electricity demand – before the full effects of Ecodesign regulations took hold (Institute for Energy and Transport (Joint Research Centre), Bertoldi, & Cuniberti, 2011; Commission Regulation (EC) No. 244/2009, 2009). Beginning in September 2009, Commission Regulation 244/2009 phased out non‐directional incandescent lamps across the EU by 2012, catalyzing rapid uptake of higher‐efficacy lighting and substantial electricity savings (Commission Regulation (EC) No. 244/2009, 2009; Institute for Energy and Transport (Joint Research Centre), Bertoldi, & Cuniberti, 2011). The implementation of stricter lighting efficiency regulations and the quick adoption of fluorescent and LED technology dramatically lowered the amount of energy required for lighting. Despite this progress, lighting still remains a major energy end- use in buildings. The exact share varies by building type, region, and operational characteristics. According to the U.S. General Services Administration, in typical buildings today 10% to 25% of a building’s electricity is consumed by lighting systems (U.S. General Services Administration [GSA], 2024). Newer or more efficient buildings might be at the lower end of that range, whereas older buildings with outdated lighting or those with very extensive lighting needs (such as retail stores or warehouses operating 24/7) might be at the higher end. Certain commercial building categories have particularly high lighting shares: for example, in big-box retail stores and supermarkets, lighting can be a major portion of total energy use (often 20–30%), and it also adds to cooling loads by emitting heat. 7 There are subtle differences all throughout the world. In places with high air- conditioning demand, lighting may account for a smaller percentage of total building energy than cooling, but it is still considerable. In milder regions or buildings that do not use a lot of HVAC energy, lighting can account for the majority of electricity use. Evolution of Lighting Strategies In commercial buildings, electric lighting traditionally accounts for a significant share of energy use, often about 20% of total consumption (Colaco et al., 2023). Early energy management strategies for lighting were relatively simple, relying on manual controls or fixed schedules. For example, manual on/off switching by occupants can theoretically yield large energy savings if lights are always turned off when spaces are vacant. In practice, however, manual control is limited by human behavior – occupants often forget or neglect to turn lights off when leaving a room (Wagiman et al., 2019). To mitigate this, fixed scheduling became a common strategy: lights are programmed to turn on and off at set times (e.g. during working hours). While scheduling prevents lights from staying on overnight or during known unoccupied periods, it cannot adjust to irregular occupancy patterns – leading to either lights being on in empty spaces or turning off while people are still present. To improve upon fixed schedules, occupancy sensors have been widely adopted since the late 20th century. These sensors (commonly passive infrared or ultrasonic motion detectors) automatically switch lights on when occupancy is detected and off (or dimmed) after a room is vacant for some delay time. Occupancy-based control directly reduces wasted lighting runtime and has been shown to cut lighting energy use significantly, often more than 20–30% savings in many applications (Garg & Bansal, 2000). For instance, one field study found that adding occupancy sensors in office lighting yielded roughly 20% energy savings compared to no sensor (Roisin et al., 2008). Another analysis concluded that such sensors can save up to about one-third of lighting electricity in typical settings (Garg & Bansal, 2000). The actual savings depend on space usage patterns: spaces with infrequent occupancy or where occupants routinely forget to turn off lights see the greatest benefit. Basic sensors use binary detection (occupied vs. unoccupied), but cannot differentiate between one or multiple occupants or adapt lighting levels beyond on/off. 8 Another traditional strategy is daylight harvesting, which uses light sensors (photocells) to dim electric lighting when sufficient daylight is available. Daylight- responsive controls reduce artificial lighting in perimeter zones or areas with skylights by measuring illuminance and maintaining a target light level. This approach has a long history in building design and can yield substantial savings. Studies have reported approximately 20% or more lighting energy reduction in offices with automated daylight dimming (Roisin et al., 2008). In practice, mixed occupancy/daylight strategies, combining occupancy sensing with daylight dimming, achieve the highest savings. A review by Wagiman et al. (2019) noted that integrated occupancy and daylight controls can save up to 95% of lighting energy in ideal conditions compared to no controls, far outperforming either strategy alone. In recent years, the development and fast adoption of Internet of Things (IoT) devices has transformed data acquisition for building energy management. Traditional standalone occupancy or light sensors are evolving into integrated sensor networks that provide high-resolution, real-time data on environmental conditions. Modern commercial buildings often deploy dozens or even hundreds of networked sensors for lighting control – for example, the Edge building in Amsterdam famously contains over 28,000 IoT sensors monitoring occupancy, lighting levels, temperature, humidity, and more (Jalia, Bakker, & Ramage, n.d.; Chaudhari et al., 2024). These devices communicate via wired or wireless networks to a central Building Energy Management System (BEMS). Compared to earlier single-purpose sensors, IoT- enabled sensors offer greater coverage and granularity. High-resolution occupancy sensors now go beyond simple motion detection: some use imaging, infrared arrays, or even Wi-Fi signal analysis to estimate not just presence but occupant counts and locations within a space (Chaudhari et al., 2024). Likewise, ambient light sensors are more sophisticated, with calibrated photodiodes or High Dynamic Range (HDR) imaging that can distinguish daylight contributions and even the direction of light. Combined with environmental sensors (temperature, carbon dioxide, etc.), these networks enable a holistic view of building occupancy patterns and comfort conditions. The integration of multiple sensor types allows lighting systems to make finer adjustments – for instance, dimming lights based on occupant density or predicted activity and, when needed, exchanging data with other building subsystems, while keeping lighting power as the prime optimisation target. IoT sensor platforms 9 also facilitate continuous monitoring and remote management. Data from lighting sensors can be streamed to cloud analytics for optimization algorithms to analyze usage trends. This marks a shift from static control based on a single sensor’s input to dynamic control based on a fusion of data from many sources. With the increase in sensor quantity and types, ensuring data quality has become paramount. Sensor accuracy and calibration directly impact the effectiveness of data- driven lighting control. For example, a light sensor that drifts or is mis-calibrated may cause improper light levels (either wasting energy or compromising illumination). Modern systems address this through initial and periodic sensor calibration procedures by adjusting sensor readings to match actual measured values (e.g., using a lux meter to calibrate a photocell) and offsetting any biases. Occupancy sensors, especially advanced ones like camera-based counters, also require calibration/validation against ground truth occupant counts (Chaudhari et al., 2024). According to Chaudhari et al. (2024), consistency across sensors is a challenge: multiple sensors of the same type must be calibrated and synchronized to ensure reliable occupancy detection and illuminance measurement (Chaudhari et al., 2024). Without calibration, errors propagate into control decisions. Therefore, inadequate sensor calibration or faulty data processing can lead to false positives or negatives in occupancy status, causing lights to turn on or off incorrectly (Chaudhari et al., 2024). To combat this, data cleaning and validation algorithms are deployed at the BEMS level to filter out anomalies (e.g. a sudden spike in brightness reading) and to flag sensor malfunctions. Data fusion techniques are also used to improve reliability: by cross- checking multiple sensors (for instance, combining a Passive Infrared (PIR) motion sensor with a camera or a carbon dioxide sensor), the system can more accurately infer occupancy than any single sensor alone (Chaudhari et al., 2024). A notable advancement in data acquisition is the integration of indoor sensor data with outdoor environmental and meteorological data. Traditional lighting controls were reactive to indoor conditions only, but modern systems increasingly incorporate external data to enable predictive and adaptive control. For example, many smart buildings now use roof-mounted ambient light sensors or even meteorological feeds (from local weather stations or online services) to gauge outdoor daylight availability. By knowing the exterior daylight level or solar irradiance, a lighting control system can anticipate how much daylight will penetrate a space and pre-emptively dim or 10 brighten lights. This feed-forward control can smooth transitions (avoiding sudden changes when a cloud passes) and maximize daylight usage. Similarly, outdoor data on weather forecasts (e.g. an approaching storm or sunset time) can be used to schedule lighting transitions more intelligently. In advanced deployments, indoor lighting IoT platforms are linked with outdoor climate sensors measuring parameters such as sky luminance, sun position, or even exterior ground reflectance, thus affecting interior daylight levels. Combining these datasets allows creation of more robust models. For instance, a machine learning model can take both indoor illuminance and outdoor weather into account to predict the needed dimming level for the next hour. Including outdoor temperature or irradiance forecasts in the feature set can improve next-hour lighting-load predictions by capturing daylight availability and occupant comfort effects that influence electric-light demand. Technologically, integrating indoor and outdoor data requires interfacing different systems – often the lighting control system communicates with a building’s weather station or pulls data from an Application Programming Interface (API). Modern BEMS using protocols like BACnet or Message Queuing Telemetry Transport (MQTT) can share data between lighting controllers and other subsystems including weather monitors (Kastner et al., 2005). The result is a richer data environment for optimization. For example, a case study by Pandharipande and Caicedo (2011) demonstrated a daylight-adaptive LED lighting system that used enhanced presence sensing together with daylight measurements, effectively linking indoor occupancy data with outdoor light availability to maintain target illumination with reduced energy use. As IoT ecosystems mature, we see lighting sensors acting in concert with broader smart building data streams which is an essential foundation for data-driven prediction. Despite these advancements, challenges remain in sensor deployment. One issue is sensor calibration drift over time due to dust, aging, or environmental factors; regular recalibration or self-calibrating algorithms are an active research area (Chaudhari et al., 2024). Another challenge is interoperability of a heterogeneous sensor network: IoT lighting systems might incorporate devices from different vendors (occupancy sensors, photosensors, smart switches, etc.), each with proprietary data formats. Ensuring these devices intercommunicate and their data can be seamlessly integrated is non-trivial (Kastner et al., 2005). Furthermore, high-density sensor networks generate vast amounts of data – raising issues of data storage, bandwidth, 11 and privacy. The push toward high-resolution occupancy sensing (e.g. video-based people counting) must be balanced with occupant privacy concerns and data security (Chaudhari et al., 2024). Nonetheless, the trend in commercial buildings is clear: more sensors, more data, and tighter integration of diverse data sources are enabling a new generation of intelligent lighting controls that respond to real-world conditions with unprecedented precision. Comparative Evaluation of Machine Learning Algorithms Accurate prediction of building energy consumption is seen as a key strategy for optimizing energy management and supporting sustainability initiatives. Data-driven methods have gained prominence in this domain, as machine learning (ML) techniques are well-suited to uncover complex relationships in energy data and often outperform simplistic engineering estimates. A wide range of ML and even deep learning (DL) models have been explored for energy consumption prediction, from classical regression and decision tree models to state-of-the-art neural networks (Tsanas & Xifara, 2012; Morcillo-Jimenez et al., 2024). A critical first consideration in these studies is the nature of the data used for model development. The input features for energy prediction can vary greatly, encompassing everything from static building characteristics to dynamic sensor readings. Some researchers have focused on using building design attributes and physical parameters to predict energy needs, which is especially useful at early design stages (Tsanas & Xifara, 2012; Olu-Ajayi et al., 2022). In contrast to such static-input studies, many works rely on time-series data collected from sensors and smart meters in actual buildings to capture operational patterns. Morcillo-Jimenez et al. (2024) exemplify this approach by gathering multi-source sensor data (e.g. electricity use, occupancy, environmental readings) from an office building under normal use, and using it to train and evaluate time-series forecasting models. Shapi, Ramli, and Awalin (2021) also focused on real-world data, developing a cloud-based prediction system for a Malaysian smart building using historical electricity consumption records from multiple tenants (Shapi et al., 2021). By pre-processing these datasets to handle noise and variability (e.g. filling missing values, filtering outliers), such studies ensure the model inputs reflect the true consumption drivers in buildings. The diversity of data sources across the literature – from simulated building performance data to real sensor 12 streams – highlights the importance of aligning modeling techniques with the available information and the targeted prediction scope (design vs. operation, short-term vs. long-term, etc.). Given these varied data inputs, researchers have explored an array of modeling approaches, often comparing conventional statistical models with more advanced ML algorithms. Shapi et al. (2021), for example, built predictive models with three different techniques – support vector machines (SVM), artificial neural networks, and K-nearest neighbors – to forecast the energy use of two commercial building tenants. Their results indicated that the SVM achieved the highest accuracy (i.e. lowest error) among the three, providing the most reliable predictions for the smart building’s energy management system (Shapi et al., 2021). Such comparative evaluations underscore that model choice can markedly affect prediction performance. Techniques like decision trees and SVM often perform well due to their ability to capture nonlinear effects, but their relative success may depend on the specifics of the dataset (feature relationships, amount of data, etc.). Overall, the consensus from these early studies is that advanced ML models (ensembles, evolutionary algorithms, etc.) tend to outperform linear models, setting the stage for the adoption of more complex learners including deep neural networks. Following the success of traditional ML approaches, recent research has increasingly turned to deep learning methods to further improve building energy predictions. Deep learning models can automatically learn intricate patterns through layered representations, which is especially beneficial for capturing temporal dependencies in time-series energy data or high-dimensional feature interactions. Morcillo-Jimenez et al. (2024) provide a clear demonstration of this by applying various neural network architectures to hourly office consumption data. In their experiments, “feed- forward” networks (a basic multilayer perceptron, MLP) were outperformed by “memory-based architectures” like recurrent neural networks, specifically an LSTM model. The LSTM (Long Short-Term Memory) network, which can retain information about prior time steps, yielded better accuracy in predicting future energy usage than the stateless MLP. They also experimented with convolutional neural networks (CNNs) for feature extraction, but still found the LSTM superior for sequence modeling. However, the authors noted an important caveat: the “lack of ample usable data” in their case study limited the gains from even more complex 13 architectures like sequence-to-sequence models, which might otherwise further improve performance. This reflects a general theme that deep learning can excel, but often requires large datasets for training. In a study targeting the design phase, Olu- Ajayi et al. (2022) also observed the power of deep models. They tested numerous algorithms (including linear regression, decision trees, random forests, SVM, gradient boosting, and a basic neural network) against a deep neural network (DNN) with multiple hidden layers for predicting annual energy consumption of residential designs. According to their findings, “DNN [was] the most efficient predictive model for energy use at the early design phase,” outperforming all other methods in terms of predictive accuracy. This result is significant because it demonstrates that even for non-time-series data (annual totals based on design features), the greater representational capacity of deep learning can capture subtle nonlinear relationships better than shallow models. Collectively, these studies illustrate that while classical ML models provide a strong baseline, carefully configured deep learning models often achieve superior accuracy for building energy prediction, especially when temporal dynamics or complex feature interactions are involved. Despite the promising results achieved by various models, researchers have identified several persistent challenges and limitations in energy consumption prediction for buildings. One major challenge concerns data availability and quality. High- performing ML models typically require abundant and representative data, yet in practice many buildings have limited historical data or undergo changes that render past data less useful. For instance, Morcillo-Jimenez et al. (2024) explicitly note that the “lack of ample usable data” constrained their ability to exploit the full potential of advanced sequence-based neural networks. Bourhnane et al. (2020) encountered a similar issue in a smart building testbed: they deployed an ANN model (with a genetic algorithm for scheduling optimization) on real IoT data but reported only “modest prediction accuracy” due to the small size of the dataset. In Tsanas and Xifara’s (2012) study, the authors caution that their ML model is effective “as long as the requested query data bears resemblance” to the training examples – essentially warning that predictions may become unreliable if a building’s characteristics fall outside the range of the learned data. This sensitivity underscores the importance of gathering diverse data that cover various building types, operational conditions, and occupant behaviors to improve model generalizability. A related challenge is model selection and 14 complexity. As Tien et al. (2022) observe in their comprehensive review, the explosion of available algorithms (hundreds of ML/DL variants) and the wide range of evaluation metrics used in different studies make it difficult to identify a single “best” model or to directly compare results across the literature. They found that studies range widely in scope (from individual HVAC equipment to entire urban districts) and temporal granularity (minutes to years), and each scenario may favor a different modeling approach (Tien et al., 2022). This diversity complicates the development of standardized guidelines for model development and assessment. Allal et al. (2024) propose a data-driven framework for forecasting lighting energy consumption by exploiting high-frequency measurements from a tertiary building’s energy management system. Similar to this thesis study, they utilize the Chulalongkorn University Building Energy Management System (CU-BEMS) dataset, which provides one-minute interval records of lighting loads alongside other power consumption and environmental variables across seven floors. After synchronizing and normalizing over 190 raw features, a variational autoencoder (VAE) compresses this multidimensional data into a 15-dimensional latent space that captures the most salient patterns influencing lighting demand. These latent embeddings are then merged with zone-specific lighting measurements and structured into sliding temporal windows, yielding inputs for an ensemble of tree-based regressors - Extra Trees, Random Forest, LightGBM, XGBoost, and CatBoost - to perform multi-output forecasting of lighting loads (Allal et al., 2024). In one-hour-ahead predictions, the Extra Trees regressor achieved an 𝑅² of 0.946 and a mean absolute error (MAE) of 0.306 for lighting energy consumption, substantially outperforming simple historical-average and “previous time-step” benchmarks. For shorter horizons, LightGBM delivered even higher precision, recording an MAE of 0.218 with 𝑅² = 0.988 for 10-minute forecasts and an MAE of 0.317 with 𝑅² = 0.974 for 20-minute forecasts (Allal et al., 2024). These results demonstrate that combining VAE-derived latent features with gradient-boosting techniques effectively captures both rapid fluctuations and underlying trends in lighting use within tertiary environments. Notably, both Allal et al. (2024) and this thesis draw upon the Chulalongkorn University Building Energy Management System (CU-BEMS) dataset to drive their predictive models. However, whereas Allal et al. restrict their analysis to a single 15 subspace combining VAE-derived latent features with data from Zone 2 on Floor 6 before feeding into tree-based regressors, this thesis broadens the scope by integrating minute-level CU-BEMS measurements from all zones across every floor. These measurements cover lighting, AC, plug loads, and indoor environmental conditions. The thesis also incorporates half-hourly outdoor weather variables from Suvarnabhumi Airport, including air temperature, station and sea-level pressure, relative humidity, and dew point (Pipattanasomporn et al., 2020a; Raspisaniye Pogodi Ltd., 2025). Afterwards, advanced machine learning techniques, including ensemble methods, decision trees, regression models utilized in order to predict how indoor–outdoor interactions drive overall energy consumption. By leveraging this unified dataset, the present work aims to bridge the gap between theoretical machine learning models and practical energy management solutions through comprehensive, data-driven insights. Energy management in electrical engineering involves forecasting, controlling, and optimizing the production, distribution, and consumption of electrical energy. Modern power systems and smart grids generate vast amounts of data and face complex, nonlinear behaviors due to renewable integration, demand variability, and dynamic controls. Machine learning (ML) methods have emerged as powerful tools to tackle these challenges by learning patterns from historical data and improving the accuracy of load forecasts, generation predictions, and system optimizations (Doğan et al., 2025; Magalhães et al., 2024). The rich sensor data now available in modern buildings has paved the way for applying machine learning (ML) techniques to lighting energy management. Supervised learning methods, in particular, are widely used for modeling and predicting building lighting needs. Wu et al. (2024) compared multiple algorithms for building energy baseline modeling and found that the Extreme Gradient Boosting Machine (XGBoost), an ensemble tree method, yielded the lowest prediction error, Root Mean Squared Error (RMSE) across all major building types, outperforming single neural network and regression models, particularly in predicting lighting energy consumption. Their findings also highlighted that time-related factors — such as the week of the year and the day of the week — played a dominant role in influencing energy consumption patterns, emphasizing the importance of temporal features in baseline modeling. Shapi et al. (2021) implemented and compared Support Vector Machines, feed-forward ANNs and k-Nearest Neighbors on Azure ML Studio, finding that ANNs achieved the 16 lowest RMSE and MAPE for tenant-level electricity including lighting predictions after rigorous preprocessing and feature engineering. Elhabyb et al. (2024) investigated educational building datasets and concluded that both LSTM networks and gradient boosting regressors must be tailored to a building’s occupancy and environmental characteristics to minimize RMSE in lighting and plug-load forecasts. Ensemble tree methods remain strong contenders. Pham et al. (2020) applied Random Forests to hourly data from multiple buildings and achieved MAE reductions of nearly 50 % compared to single decision trees for one-step-ahead lighting predictions, maintaining robust performance even at 12- and 24-step forecasting horizons. In general, ensemble methods reduce overfitting and improve generalization by averaging the predictions of many trees, which is advantageous given the noisy data in buildings (occupancy patterns, sensor noise, etc.). Beyond pure prediction, Reinforcement Learning (RL) offers a promising way to convert lighting-load forecasts into real-time control actions that minimise energy while safeguarding visual comfort. Although the earliest building-scale RL studies targeted HVAC, the last decade has shifted attention toward adaptive lighting. A notable advantage of RL is that it does not require an explicit predictive model of occupancy or daylight, instead of that, it can learn behavior patterns implicitly. However, training an RL model in a real building is challenging (due to exploration risks and long learning times), so researchers have used simulations or imitation learning to bootstrap these controllers (Vázquez-Canteli & Nagy, 2019). Deep reinforcement learning (using deep neural networks as function approximators) has also been explored to handle high-dimensional sensor inputs, though it requires large amounts of training data. Data-driven optimization often means combining diverse data sources to increase the prediction accuracy. This includes merging indoor sensor data with external data like weather and daylight availability. It can also involve integration of historical data and contextual data. Advanced algorithms perform data fusion such as combining Passive Infrared (PIR) motion sensor data with camera-based occupancy counts and even badge access logs to get a more robust estimate of space usage (Chaudhari et al., 2024). This fused occupancy information can then drive lighting control more reliably than any single sensor input. A critical component of human-centric lighting energy prediction is understanding and incorporating occupant behavior. Occupant behavior 17 – how people use space, their interactions with lighting (e.g., manual switches or task lamps), and their comfort preferences – has a profound impact on lighting energy outcomes (Rusek et al., 2022; Uddin et al., 2021). Traditional systems treated occupants as passive triggers (motion sensed = lights on), but data-driven approaches strive to model behavior patterns and even anticipate or influence occupant actions. For example, occupancy sensors by themselves do not capture cases where occupants might prefer lower light or might switch off lights when daylight is sufficient. Data- driven models can learn that certain occupants tend to manually turn lights off at certain times (or conversely, always leave them on), and adjust the automation accordingly (e.g., provide gentle dimming that encourages manual off, or override a habitual non-optimal behavior). In summary, data-driven optimization in lighting increasingly merges technical efficiency goals with human-centric factors. This approach recognizes that the ultimate purpose of building lighting is to serve people, so the “optimal” solution is one balancing human factors with energy minimization rather than pursuing energy savings at all costs. Studies consistently find that when occupant behavior is properly accounted for, the resulting controls achieve high satisfaction and robust energy reductions (Paone & Bacher, 2018). Utilized Machine Learning Approaches Figure 1.1 provides a consolidated taxonomy of contemporary machine‑learning paradigms, categorising them into supervised learning – sub‑divided into classification and regression tasks – unsupervised learning, represented chiefly by clustering, and deep learning, which encompasses deep neural‑network architectures. Representative algorithm families are indicated for each branch; examples include linear regression and decision‑tree methods under supervised learning, k‑means under clustering, and multilayer as well as convolutional neural networks within deep learning. Although the diagram positions the reader within the broader methodological landscape, the analytical framework adopted in this thesis is restricted to supervised regression techniques tailored for lighting‑energy prediction by using CU-BEMS data. The evaluation set therefore comprises classical linear models – Ordinary Least Squares, Interactions, Robust, Stepwise, Efficient Linear Least Squares – together with tree‑based learners at varying complexity levels (Fine, Medium, and Coarse Trees) and their ensemble variants, namely Bagged and Boosted Trees, implemented via 18 MATLAB’s Regression Learner. Unsupervised clustering methods and deep neural architectures are acknowledged for completeness but remain outside the scope of the present investigation. Figure 1.1 : Taxonomy of machine‑learning methods highlighting supervised learning, unsupervised learning, and deep learning and their representative algorithm families. Note: Reprinted from Tien et al. (2022, Fig. 1a). 1.7.1 Linear models The Linear Regression model is a technique used to predict a dependent variable (target) based on one or more independent variables (predictors). It can be seen in equation 1.1: 𝑦( = 𝛽7 + N 𝛽$𝑓$P𝑋(9, 𝑋(:, … , 𝑋(;, Q + 𝜀( , $ $<9 𝑖 = 1, 2, … . , 𝑛 (1.1) 19 Here, 𝑦( is the dependent variable (target) for the 𝑖-th observation, 𝛽$ represents the regression coefficients, 𝛽7 is the intercept (constant term), 𝑋() represents the independent variables for the 𝑖-th observation and 𝑗-th predictor (𝑗 = 1,… , 𝑝), and 𝜀( is the error term (residual) associated with the 𝑖-th observation. The term 𝑓$ represents a scalar function of the predictors, which can be a linear or polynomial function (Sattar et al., 2022). The Ordinary Least Squares (OLS) approach is based on modeling the relationship between one or more independent variables and a dependent variable that is continuous or at least measured on an interval scale. The primary objective is to minimize the total squared differences between the observed values and the model’s predicted values. These differences, known as errors, represent how far predictions deviate from actual outcomes. Linear regression, whether involving a single predictor or multiple predictors, is the most widely used method that applies the OLS technique (Zdaniuk, 2014). A linear regression model extended to include interaction terms (products of features) or nonlinear transformations of features. By accounting for interactions, the model can represent how the effect of one predictor on the target may depend on the value of another predictor. Including interaction terms can improve predictive power if such combined effects are present in the energy data (Morgan & Obringer, n.d.). Stepwise regression is a statistical approach that automates the selection of predictor variables in a regression model. At each stage of the procedure, variables from a pool of potential explanatory factors are considered for inclusion or removal based on predefined statistical criteria, typically involving a sequence of t-tests or F-tests. This iterative testing process helps identify the most relevant set of variables for use in predictive modeling. Stepwise regression is commonly utilized in disciplines such as education and psychology, where it assists in both selecting meaningful variable subsets and determining the relative importance of those variables (Miller, Panneerselvam, & Liu, 2022; Thompson, 1995). Ordinary least squares (OLS) regression provides optimal estimates when all underlying assumptions are met. However, if these assumptions are violated, OLS performance can deteriorate. While residual diagnostics can identify assumption breaches, they may be time-consuming and challenging for those without specialized 20 training. Robust regression methods offer an alternative by relaxing some of these assumptions. These techniques aim to lessen the impact of outliers, thereby achieving a more accurate fit for the majority of the data (Pennsylvania State University, n.d.). Efficiently trained linear regression models are designed to expedite the training process by employing techniques that reduce computational time, albeit with a potential trade-off in accuracy. These models, including linear least-squares and linear support vector machines (SVMs), are particularly advantageous when working with datasets that have a large number of predictors or observations. In such scenarios, opting for efficiently trained models over traditional linear or linear SVM presets can lead to significant time savings during training (MathWorks, Inc., n.d.). 1.7.2 Decision Tree Decision trees organize decisions in a hierarchical flowchart: each internal node tests a specific feature of the input, and each outgoing branch corresponds to one