Aiding agricultural practices with the exploration of earth observation data via machine learning
Aiding agricultural practices with the exploration of earth observation data via machine learning
Dosyalar
Tarih
2023-08-04
Yazarlar
Çelik, Mehmet Furkan
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
The rapid growth of the global population, coupled with the decline in available agricultural fields, the effects of climate change, and soil degradation, pose significant threats to food security. As the population continues to rise, the demand for food and agricultural products increases, putting pressure on optimizing limited resources and their use. The human-made global climate crisis, primarily driven by fossil fuel emissions, worsens the issue, causing extreme weather events and displacing communities. Minimizing environmental damage and maximizing agricultural efficiency is crucial for ensuring a sustainable supply of essential food resources and the well-being of humanity. Obtaining accurate information about agriculture is vital for decision-makers, but traditional in-situ measurements are insufficient to represent the fields and are time-consuming. Remote sensing satellite images provide a solution by offering comprehensive and reliable data, overcoming the limitations of traditional methods, and enabling effective monitoring of agricultural fields on a regional or larger-scale level. Remote sensing satellite imaging technologies, including synthetic aperture radar (SAR) and multi-spectral imaging (MSI) satellites, provide valuable information for Earth Observation (EO) studies. SAR satellites are able to operate in any weather condition, day or night, and penetrate cloud cover, making them highly effective for monitoring Earth's surface. Despite their reliance on clear skies and solar energy, MSI satellites play a crucial role in agricultural monitoring due to their value with a wide range of spectral bands. Both satellite systems have a significant role in observing agricultural fields; SAR satellites are sensitive to detecting morphological changes in crops and MSI satellites have the capability to monitor chemical changes in vegetation. The satellite images offer insights into crop health, growth stages, and potential yield prediction through parameters derived from MSI and SAR images. Utilizing machine learning (ML) algorithms to analyze remote sensing data for agricultural research has opened up a wide range of possibilities for conducting comprehensive studies based on the ability of these algorithms to grasp nonlinear relationships associated with electromagnetic radiation and vegetation. Agricultural planning authorities and researchers can obtain critical insights into many aspects of agriculture and make informed decisions by utilizing the power of these advanced computing approaches. For this purpose, in order to address the critical challenges in monitoring agricultural fields and understanding the interrelation between environmental factors and agricultural activities, three-stage research that implements state-of-art ML and deep learning (DL) methods on remote sensing images has been conducted within the scope of this thesis. These challenges include various aspects of agricultural analysis and can be effectively tackled using the power of ML and DL algorithms that explain the models' behavior in an easy format to understand. In the first study, regression analysis was used to examine the estimation of biophysical parameters using only SAR remote sensing satellite data. Among the regression methods, polynomial chaos expansion (PCE) is one of the reliable and interesting ones due to its tight relationship with uncertainty quantification. One of the advantages of PCE is that global sensitivity analysis (GSA) with Sobol's method can be analytically computed from polynomial coefficients if the input space is statistically independent. However, most of the phenomena include dependent features, either statistically or physically. Therefore, an independent and uncorrelated input space must be created before the regression analysis. In this paper, we performed PCE-based regression analysis for the estimation of biophysical parameters of crops. The study was conducted in the experimental fields of field pea, barley, canola, and oat of the AgriSAR2009 campaign. The input parameters of the regression model were formed by creating polarimetric features derived from RADARSAT-2 imagery. The estimated biophysical parameters were based on the discrete in-situ measurements of leaf area index (LAI) and normalized difference vegetation index (NDVI), scattered semi-randomly in each crop field. We implemented neighborhood component analysis (NCA) to create an independent and uncorrelated input space by eliminating correlations. Once the model was created, we investigated the importance of features that drive the PCE-based regression models applying GSA with Sobol's method. Besides the individual effects of each feature, their interactions were found to be significant. In the second study, time series analysis was conducted to obtain short-term soil moisture in field scale, integrating satellite imaging, climate, and auxiliary data. The recent advancements in different types of satellite imagery coupled with deep learning-based frameworks have paved the way for large-scale SM estimation. This research combined high spatial resolution Sentinel-1 (S1) backscatter data and high temporal resolution Soil Moisture Active Passive (SMAP) SM data to create short-term SM predictions that can accommodate agricultural activities. We created a deep learning model to forecast the daily SM values using time series of climate and radar satellite data, soil type, and topographic data. The model was trained with static and dynamic features that influence SM retrieval. While the topography and soil texture data were taken as stationary, SMAP SM data and S1 backscatter coefficients, including their ratios and climate data were fed to the model as dynamic features. As a target data to train the model, we used \textit{in-situ} measurements acquired from the International Soil Moisture Network (ISMN). We employed a deep learning framework based on Long Short-Term Memory (LSTM) architecture with two hidden layers with 32 unit sizes and a fully connected layer. The model's performance was also evaluated concerning above-ground biomass, land cover classes, soil texture variations, and climate classes. The model prediction ability was lower in areas with high normalized difference vegetation index (NDVI) values. Moreover, the model can predict better in dry climate areas, such as arid and semi-arid climates, where precipitation is relatively low. The daily prediction of SM values based on microwave remote sensing data and geophysical features was successfully achieved using an LSTM framework to assist various studies such as hydrology and agriculture. In the third study, the importance of the input features was investigated during the cotton phenological cycle in order to predict yield using an explainable artificial intelligence. The potential cotton yield can be predicted by integrating the climatic factors, soil parameters, and biophysical parameters observed by high temporal and spatial resolution remote sensing satellites. This study used a multisource dataset to create an explainable and accurate predictive model for cotton yield prediction over the continental US (CONUS). A recently proposed glass-box method called Explainable Boosting Machine (EBM), which provides transparency, reliability, and ease of interpretation, was implemented. Accuracy performance was compared with well-known ML methods for predicting cotton yields. The EBM showed higher accuracy against other glass-box methods and competitive results with black-box models. With the help of the EBM, the importance of individual features and their pairwise interactions was revealed without applying any post-hoc methods. The study findings showed that the precipitation (P), enhanced vegetation index (EVI), and leaf area index (LAI) are the three most important dynamic features. The dynamic features are the driver of the created model with 78% of the overall feature importance, followed by pairwise interactions of the features with 16% contribution. Lastly, static features contribute 6% to the overall feature importance. The study highlights the importance of using multisource data and interactions of the input features and providing an interpretable model to understand the inner dynamics of cotton yield predictions.
Açıklama
Thesis (Ph.D.) -- Istanbul Technical University, Graduate School, 2023
Anahtar kelimeler
satellite data,
uydu verileri,
agricultural practices,
tarım uygulamaları,
machine learning,
makine öğrenmesi