LEE- Hesaplamalı Bilim ve Mühendislik-Yüksek Lisans

Bu koleksiyon için kalıcı URI

http://hdl.handle.net/11527/19296

Gözat

Augmented superpixel based anomaly detection in hyperspectral imagery

(Graduate School, 2024-07-01) Gökdemir, Ezgi ; Tuna, Süha ; 702211005 ; Computational Science and Engineering

The detection of anomalies in hyperspectral images depends on several factors. Here, the spatial proximity of anomalies and confusion in the background image can create a bottleneck at the point of anomaly detection. Hyperspectral images are tensor data, in which each pixel contains both spatial and spectral information. These complex data structures pose significant challenges for traditional anomaly detection methods, which often struggle to account for the intricate relationships between the different spectral bands. In this thesis, a method called "Augmented Superpixel (Hyperpixel) Based Anomaly Detection in Hyperspectral Imagery" is proposed. This method aims to enhance the anomaly detection by leveraging advanced dimensionality reduction and segmentation techniques. Our approach begins by reducing the three-dimensional HSI data using methods such as high-dimensional model representation and Principal Component Analysis. This step simplifies the data while preserving critical spectral and spatial information. By capturing the most significant components of the data, these techniques help eliminate noise and irrelevant details, thereby making the subsequent analysis more focused and effective. We then applied segmentation methods such as Simple Linear Iterative Clustering and Linear Spectral Clustering to divide the image into distinct regions known as superpixels. Each superpixel is augmented with its first-order neighbors to form hyperpixels, which provide a richer context for anomaly detection. The augmentation process ensures that the local context is considered, thereby enhancing the ability to detect subtle anomalies that may be missed when examining individual superpixels in isolation. This neighborhood information is crucial for accurately identifying the boundaries of anomalies and distinguishing them from normal variations in the data. Finally, we applied the Local Outlier Factor algorithm to these hyperpixels to identify the outlier points that signify anomalies. The capability of the Local Outlier Factor to evaluate local density deviations enables it to accurately identify anomalies, even in densely populated or intricate backgrounds. The combination of these techniques ensures comprehensive and precise analysis that can handle the diverse characteristics of hyperspectral datasets. The proposed algorithm was tested using various hyperspectral image datasets and demonstrated good performance in detecting anomalies. By integrating dimensionality reduction, segmentation, and anomaly detection techniques, this method effectively manages the complexity of the hyperspectral data. This comprehensive approach allows for accurate identification of anomalies, even in challenging conditions where anomalies are closely packed or the background is complex. Through rigorous experimentation, the algorithm demonstrated robustness and reliability, making it a promising tool for hyperspectral image analyses. Its versatility and high accuracy across different datasets underline its potential for broad application in fields such as remote sensing, environmental monitoring, and urban planning. The ability to adapt to various anomaly characteristics and dataset structures makes this method a valuable addition to the toolkit for hyperspectral image-analysis techniques.
Design of short peptides targeting the interaction between SARS-CoV-2 and human ACE2

(Graduate School, 2023) Usta, Numan Nusret ; Baday, Sefer ; 702171021 ; Computational Science and Engineering Programme

In the end of 2019, a novel coronavirus called SARS-CoV-2 has appeared in Wuhan, China and caused a pandemic outbreak by overwhelming deadly infections around the work. By comparing to its ancestor, SARS-CoV, it is more infectious and shows much more tight binding with host cells. One of the main units of this coronavirus is spike (S) protein which has two subunits called S1 and S2. Subunit S1 contains receptor binding domain (RBD) that takes role in initiating entry into the cells by interacting with human angiotensin converting enzyme 2 (ACE2) receptor. And S2 subunit ensures the fusion between host and viral cell membranes in meantime. Different drug development researches were studied and several therapeutics developed during the pandemic time such as antibodies and vaccines. Differing from traditional drug developments methods, peptides as inhibitor are promising drug compound due to their efficiency, lesser immunogenicity, easiness of removal from body and they have higher diffusivity through tissues and organs because of their smallness. Considering these advantages of peptides, several studies have been made with knowing antiviral peptides by different research groups. In present study, what we aimed to develop novel peptides to use against spike RBD. Unlike by using known peptides from the literature, we try to use as much possible as combination of short peptides. First, we created peptides of random sequences, after that we docked them to spike RBD protein by using AutoDock CrankPep (ADCP). For deciding to use the proper docking tool, we have done a comparison work between free docking tools (Vina and ADCP) which are developed by AutoDock Software. After finishing the dockings, top ones with highest binding energy result are selected for the next step which is Molecular Dynamics (MD) simulations. Results of simulations are controlled according to their RMSD trajectory and binding energy. At the end, one sequence, "wfdwef", stood out as promising over the others. For further, this finding can be used as potential inhibitor for coronavirus after experimental studies
Exploiting optimal supports in enhanced multivariance products representation for lossy compression of hyperspectral images

(Graduate School, 2024-01-31) Şen, Muhammed Enis ; Tuna, Süha ; 702211008 ; Computational Science and Engineering

Data serves as an irreplacable foundation of modern society as it is the core element of numerous fields such as technological innovations, scientific advancements, and economic decisions. It enables insights into domains of knowledge and experience, assistance with decision-making tasks, and predictions for future outcomes. It has progressed since the very beginning of time from being knowledge and information kept in physical formats like carvings on cave walls and events conserved as inscriptions, to evolving into such mathematical structures that can be obtained by any interaction made through current technological devices like interactions on social media and observations acquired through the use of advanced tools owing to technological advancements. Data transforming into more detailed and complex structures poses efficiency challenges that entails computational methods which can handle the processing and handling of such structures. For the data handling needs, many methods have been proposed and have been in use thus far. Each has its advantages as well as some drawbacks in the form of problems in either certain limitations or computational complexity issues. Some alternative workarounds have been suggested for these kinds of issues such as embracing an iterative approach rather than employing direct solutions and techniques have been customized to fit specific workflows. Moreover, some innovative approaches operate on representations of data that have undergone compression and transformation, rendering them into more easily processable structures. An important aspect of these practices is the preservation of the data's characteristic features. Compression methods execute this procedure in unique ways like exploiting the eigenvalues and eigenvectors or utilizing singular values. These techniques not only streamline the processing of data but also contribute to the efficiency and accuracy of analyses by retaining characteristic features throughout the compression process. In the field of data processing, an understanding of these diverse methodologies proves convenience in selecting the most effective solutions for the application under consideration. Hyperspectral imaging is an area that requires such computational techniques to process the collected data due to its high dimensional workflow. It outputs 3-dimensional mathematical structures where the first two dimensions correspond to the spatial attributes of the captured area while the third dimension captures the spectral information with respect to the obtaining device's capacity of retrieving bands. As a result, the fibers in the data's third dimension relate to spectral signatures that empower the identification of objects and materials. The ability to analyze these spectral data opens doors to multiple useful applications in numerous areas like remote sensing, agriculture, medical imaging, archaeology, and urban planning. Recent studies in computational sciences for high-dimensional structures have adopted new methods that improve the overall processing performance and make more in-depth analyses possible. Considering the relational design in its third dimension, the High Dimensional Model Representation (HDMR) is a technique that hyperspectral imaging can benefit deeply thanks to its decorrelation properties. The aim of HDMR is to represent multivariate functions in terms of lower dimensional ones. But thanks to the way it was defined, this technique is also applicable on tensors, hence, it can be used to decompose a given tensor in terms of less dimensional entities where each element refers to the attitude of a certain combination of dimensions. This ability of HDMR addresses the decorrelation of each dimension of the given data. The decorrelation procedure enables reducing the noise and removing artifacts while preserving the high-frequency components. Hence, it can be said that HDMR is a suitable compression technique for high-dimensional data with strong relations on individual axes such as hyperspectral images. HDMR employs a set of weights and support vectors to represent data, consequently, necessitating calculation steps. These entities are either assigned certain values or arranged using techniques like Averaged Directional Supports (ADS) but the process of calculating the optimal entities can also be optimized by employing iterative methods such as the Alternating Direction Method of Multipliers (ADMM) where the entailments of HDMR could be used as constraints of ADMM. A sub-method of HDMR which is called the Enhanced Multivariance Products Representation (EMPR) specializes in optimizing the representation by focusing on the support vectors. The weights are assumed to be constant values or scalars and the support vectors are managed by the previously mentioned calculation techniques. As these methods employ the main data for the calculation of the support vectors, they introduce a more robust method EMPR compared to HDMR. Iterative approaches like ADMM can assist in properties of these support vectors such as enforcing sparsity for better representions and improving denoising capabilities. This thesis work explores the hyperspectral imaging area and proposes a new perspective on the decomposition methods by bringing a tensor-based iterative approach to EMPR through the use of ADMM. The study compares the proposed method's performance and efficiency with some other well-known tensor decomposition techniques, namely CANDECOMP/PARAFAC Alternating Least Squares (CP-ALS) and Tucker Decomposition (TD), while also comparing the results to EMPR's regular application by ADS. Multiple tests are performed on hyperspectral datasets which are 3-dimensional and as a result, the proposed technique is arranged to be applicable on any 3-dimensional tensor especially data that can benefit the decorrelation properties of EMPR. As a result of EMPR, the relations in each dimension and the combinations of these dimensions are acquired through the support vectors. Results from multiple metrics prove that the proposed method performs similarly to the mentioned tensor decomposition methods for specified ranks and the decorrelated dimensions are successfully represented by the 1-dimensional EMPR components. Tests also employ the 2-dimensional components to reveal the effect on final representations with comparisons to CP-ALS and TD aiming for multiple rank options. The key point of this proposed technique lies in EMPR's superior decorrelation ability. Not only does it demonstrate the capability of reconstructing high-dimensional data with similar accuracy but it also highlights its potential to reduce noise and artefacts in the process. These results are particularly promising for any lossy compression task including Cartesian geometry utilizing tensor decomposition techniques where accurate and efficient data processing is paramount. Furthermore, this performance advantage paves the way for advancements in lossy compression techniques, enabling researchers and practitioners to gain more precise insights from data.
Identification of novel inhibitors targeting putative dantrolene binding site for ryanodine receptor 2

(Graduate School, 2022-06-13) Saylan, Cemil Can ; Baday, Sefer ; 702191003 ; Computational Science and Engineering

Ryanodine receptors (RyRs) are large (around 20 MDa) homotetrameric intracellular ion channels. RyRs are located in the membrane of the sarcoplasmic reticulum (SR). RyRs play a central role in the excitation-contraction coupling by regulating Ca2+ release from the SR to the cytosol. Three isoforms are sharing 70% sequence similarity; RyR1, RyR2 and RyR3 are predominantly expressed in skeletal, cardiac muscles and neurons, respectively. In all isoforms, the open and close state transition occurs by rigid-body shifts of domains that provide the overall breathing motion of the cytoplasmic region. Dysregulation of RyRs leads to abnormal cellular activity. More than 300 mutations have been associated with muscle and neuronal diseases. Today, there have been identified several heart diseases caused by aberrant RyR2 activity such as catecholaminergic polymorphic ventricular tachycardia, cardiomyopathies, and cardiac arrhythmias. This unwanted RyR2 activity can be modulated by drugs. Dantrolene is an approved muscle relaxant used for the treatment of malignant hyperthermia that occurs by dysregulation of RyR1. Previously, a dantrolene binding sequence was suggested, and this sequence is conserved for all RyRs. However, dantrolene has poor water solubility and dantrolene exerts its effect on RyR2 only in the presence of regulators such as Mg2+ and calmodulin. In addition to this, although the amino acids 590-609 in RyR1 (601-620 RyR2 equivalent) were identified as dantrolene binding sequences, dantrolene bound complex structure has not been elucidated yet. Here, we aimed three things; 1) modelling of full atom structure of RyR2 with membrane, 2) predicting dantrolene binding orientation and 3) identifying novel inhibitors targeted to the putative binding sequence of dantrolene to regulate RyR2 function. While most of the structure of RyR2 has been solved recently, some of the regions are still missing. To predict missing regions, we used trRosetta and AlphaFold2 which is a state-of-art deep-learning-based method for protein modelling. Each missing segment was modelled separately and combined at the end. The final model, then, was optimized by 35ns MD simulation. Subsequently, to predict the dantrolene binding pose, the putative dantrolene binding site was searched using three different docking programs; Vina, LeDock and Glide. There was a distinct difference around the dantrolene binding site between the AF2-based model and Cryo-EM based model. Thus, the docking was performed for either structure. The dantrolene population was obtained predominantly in a particular cavity formed by 6 domains. Among docking results, five binding poses (3 of cryo-em, 2 for AF2 model) were selected regarding the affinity scores and pose similarity. These poses were used in 200ns MD simulation (298K, NPT) to address the pose binding behaviour. In the simulations truncated system was used, and restraints were introduced at regions where we split the structure. After 200ns, four orientations of dantrolene (2 for each) remained in its interaction with the dantrolene binding sequence. To calculate the binding free energy of the binding poses, MMPBSA analysis was performed using MD trajectories. Besides this, we also investigated the FKBP12.6 binding effect on dantrolene binding using the docking structures. Here, all proceeded MD runs with dantrolene were replicated with FKBP12.6 bound conditions. Structural clustering of MD simulations together with MMPBSA results showed that particular dantrolene orientation showed the highest binding capability to around R606, E1649, and L1650 residues. However, we could not identify a significant effect of FKBP12.6 on dantrolene binding. Next, for the identification of novel inhibitors, we focused on the dantrolene binding region and the high-throughput screening of 3.5 million molecules retrieved from the ZINC15 database was applied. The molecules were selected based on molecular weights (<450 Da) and logP values (<3.5). Virtual screening has proceeded with 3-step gradual filtering. The initial step includes 3.5 million molecule screening using AutoDock Vina with 8 exhaustiveness. This step was followed by two screening procedures in which the top-ranked 200K molecules selected from the previous step were filtered with LeDock and Vina (with 24 exhaustiveness). Molecules shared in top-10K for both Vina and LeDock results were used for the third and last screening with GlideXP. Subsequently, among the top-100 molecules, top-20 was selected as the final candidate list. According to >%70 human oral absorption conditions, the best 11 molecules have proceeded for MD simulations. 200ns MD simulation was carried out using Desmond. The seven molecules remained in their interaction with the dantrolene binding sequence. These were suggested as candidates that might regulate RyR2 activity. Particularly, two molecules showed significant stability at the binding site at around 1Å. These molecules will be tested experimentally by our experimental collaborators.
Jig shape optimization for desired shape of a high-altitudelong-endurance class unmanned aerial vehicle underaeroelastic effects

(Graduate School, 2024-07-24) Ateş, Akın ; Nikbay, Melike ; 702211001 ; Computational Science and Engineering

The field of aviation is continuously developing, and with advancements in technology, it has established a strong foundation. The desire to integrate air travel into everyday life has accelerated the design and production processes of aircraft. In civil aviation, aircraft are designed with the aim of reducing fuel costs, flying over long distances, and carrying the maximum number of passengers. The military sector, on the other hand, emphasizes features such as flying faster, reaching higher altitudes, and carrying weapons and missiles. Meeting these diverse requirements is possible through the most efficient design of power systems, structures, and aerodynamics in the aircraft. Since engine design falls outside the scope of this thesis, this thesis focuses on aerodynamic and structural design. The desire to make aircraft lighter to achieve better performance has increased. With the advancement in material technology, stronger and lighter aerospace materials have been developed. These lightweight materials are more flexible compared to traditional materials, making them ideal for modern day designs. While aircraft produced with these materials are 30% lighter than those produced with conventional materials, they are also more susceptible to elastic effects. Therefore, to fully realize the benefits of weight reduction, it is crucial to examine the aeroelastic effects on the aircraft in more detail. The field of aeroelasticity focuses on understanding and addressing the combined effects of inertial, elastic, and aerodynamic forces. Its popularity, within the aircraft design community, is constantly increasing as aircraft are becoming more and more flexible. Aeroelasticity is generally divided into two main categories: static aeroelasticity and dynamic aeroelasticity. Some of the common phenomena associated with static aeroelasticity include control reversal, effectiveness, and divergence, while the phenomena associated with dynamic aeroelasticity include buzz, buffet, gust, and flutter. In the aircraft design process, once the conceptual design is finalized during the preliminary design stage, the aerodynamicists start to work on the external geometry to achieve an optimal design. This aerodynamically optimized wing is then handed over to structural engineers, who manufacture it within specified production tolerances. These manufacturing constraints cause the aircraft's external surface to deviate slightly from its optimized design. In summary, there are geometrical differences between the optimal aerodynamic design and the manufactured geometry. These differences result in discrepancies between the calculated performance values of the optimized design and the actual performance values of the manufactured geometry. When the the loss due to geometrical difference is added the aerodynamic performance values will significantly decrease. The external geometry of an aircraft is a dynamic, living cycle. It undergoes many changes from design to production. These external shapes are generally divided into two groups: theoretical shapes and practical shapes. Examples of theoretical shapes include the 1G flight shape, jig shape, and engineering shape. Examples of practical shapes include the manufacturing shape, parking shape, actual flight shape, and operation shape. The aim of this study is to arrive at a more effective design during the preliminary design stage of the aircraft design process by incorporating a multidisciplinary approach. Due to time constraints in aircraft design processes, designers often avoid complex and expensive analyses. This study proposes a method to mitigate these challenges by providing a quick solution for integrating multidisciplinary analysis into the preliminary design stage, thereby enabling a more effective design process. In this study, the RQ-4 Global Hawk, a HALE class unmanned aerial vehicle, is selected. The reason for choosing the RQ-4 Global Hawk is that it has very large aspect ratio making the elastic effects more apparent. Initially, the point cloud data available in the literature for the RQ-4 Global Hawk is acquired. A structured mesh capable of creating this point cloud is generated using a Python coded. This mesh is employed to establish a ZONAIR aerodynamic model, a 3D panel method that uses high-order panels. The results obtained using the ZONAIR aerodynamic model was validated against the available flight data in literature. The same grid are used for the structural analysis of the RQ-4 Global Hawk. The material density is chosen based on the real-life weight of the RQ-4 Global Hawk. The weight distribution is made proportional to the volumes of the RQ-4 Global Hawk's components. The FEM analysis of the RQ-4 Global Hawk is performed with composite materials, using stiffness values found in the literature. A modal analysis is then conducted to determine the natural modes and frequencies related to the wing. After preparing the ZONAIR aerodynamic model and FEM model of the RQ-4 Global Hawk aircraft, the ZONAIR model is made ready for aero-structural coupling. For specific Mach numbers and angles of attack, both the rigid (desired) and elastic (flight shape) results are obtained. Since the RQ-4 Global Hawk is a subsonic aircraft, the chosen angles of attack and Mach numbers show linear values in the graphs. This is advantageous for UAV because improving one design point aeroelastically will automatically improve other design points as well. In light of this information, the differences in lift coefficients between the flight shape and the rigid shape are measured. The results indicate a 5.5% aerodynamic loss between the flight shape and the rigid shape. This difference is a significant loss for aircraft. This difference arises due to the elastic structure of the RQ-4 Global Hawk. To minimize the loss caused by elastic effects, a solution method is developed in this thesis. Defining and applying the jig shape in the design process is crucial to prevent the loss caısed by elastic effects. There are two ways to address the difference caused by elastic effects: the first is structural reinforcement, which is generally undesirable. As increasing stiffness automatically increases weight. In this study, the other method, managing deflections and twists, is exemined. The importance of jig shape design has increased in the aircraft design process. In this study, the main goal is to incorporate jig shape design into the preliminary design process and develop a methodology. A methodology for rigid, elastic, and jig shape design is developed and used iteratively for design optimization. Generally speaking, the procedure involves taking an aerodynamically optimized wing as the target shape. Next, rigid and elastic solutions are obtained at specific design points for this target shape. In the elastic solution, the loads are extracted. These extracted, inverted and applied to the aircraft to obtain an aero-structural solution and identify the initial jig shape. The initial jig shape of the aircraft is found. This initial jig shape is then subjected to the same conditions in an aero-structural solution, and the new flight shape is measured. In the next step, the new flight shape is compared to the target shape. If the difference between the new flight shape and the target shape is below a certain limit, the iteration ends. If it is exceeds the limit, the process starts over. The loads from the new flight shape are inverted, and the jig shape for the second iteration is found. This is then subjected to the ZONAIR aero-structural solution to obtain the flight shape, which is compared again to the target shape. The methodology developed in this study is both fast and practical, requiring many iterations are needed to find the most optimal jig shape. Optimization methods have been used to make this process more intelligent. Especially, a stable and widely used optimization method has been selected. The aim of this study is to simplify complex models to achieve faster solutions, so a fast-working gradient-based optimization method, SQP, has been chosen as the optimization method. A effective optimization model has also been established for the RQ-4 Global Hawk aircraft, automating the jig shape optimization procedure. This procedure enhances the aircraft design process by enabling raid jig shape optimization during the preliminary design stage. This jig shape optimization increases the efficiency of the aircraft, making it more effective. Jig shape optimization is a process that contributing to reaching the targeted range and achieving more successful observations and weapon firings in the field.
Optimization of mixing efficiency of air and hydrogen in scramjet combustor

(Graduate School, 2025-01-22) Şasıcı, Mustafa ; Edis, Fırat Oğuz ; 702201008 ; Computational Science and Engineering

This thesis investigates the optimization of a scramjet combustor to enhance mixing efficiency and total pressure recovery (TPR) through a combination of computational fluid dynamics (CFD) simulations and Bayesian optimization (BO). Scramjet engines, operating at hypersonic speeds, rely on efficient fuel-air mixing in a very short timeframe to achieve effective combustion. Improving mixing efficiency is critical for ensuring stable combustion and maximizing thrust, while maintaining TPR is essential to harness the energy of the flow without incurring excessive pressure losses. A two-dimensional CFD model of a scramjet combustor, developed using OpenFOAM's reactingFoam solver with chemical reactions disabled, forms the core of this investigation. The model incorporates a k–ω SST turbulence model to accurately capture complex flow phenomena, including shock-wave/boundary-layer interactions and turbulent shear layers. The combustor geometry is based on a DLR configuration and is systematically varied by changing key geometric parameters: wedge angle, distance between injectors, and injection angle. These parameters influence the flow structure, jet penetration, and turbulence intensity, ultimately affecting both mixing efficiency and TPR. Bayesian optimization is employed to identify the optimal combination of parameters. A Gaussian Process (GP) surrogate model approximates the objective function, defined as a weighted sum of mixing efficiency and TPR. The optimization process begins with an initial set of samples selected systematically (without employing previously assumed sampling methods), ensuring a broad exploration of the parameter space. The GP surrogate is iteratively updated as new CFD evaluations are performed, guiding the search toward promising regions that balance exploration and exploitation. Results demonstrate that carefully chosen parameters can significantly improve mixing efficiency and achieve a favorable compromise with TPR. The optimal configuration identified through this process enhances fuel-air interaction, resulting in more uniform distribution of the hydrogen mass fraction at downstream locations. Ultimately, this study provides valuable insights into the complex interplay between geometric design and aerodynamic performance in scramjet combustors, offering a robust methodology to guide future hypersonic propulsion system development.
Predicting the bandgap of hole-transport materials by deep learning

(Graduate School, 2023-01-30) Aydın, Miraç ; Tekin, Adem ; 702191021 ; Computational Science and Engineering

Çalışmada öncelikle OMDB veri seti yapay çoklamaya maruz bırakılmadan kullanıldı. SchNetPack ve ALIGNN modellerinin makalelerinde yer verilen varsayılan parametrelerle eğitim süreci gerçekleştirildi. Eğitim süreci NVIDIA Tesla A100 (40GB), 2 adet NVIDIA RTX A4500 (20GB) ve NVIDIA RTX A4000 (16GB) grafik kartları üzerinde gerçekleştirildi. Yapılan eğitim sonucunda SchNetPack ve ALIGNN modelleri için MAE değeri sırasıyla 0.43 eV ve 0.25 eV bulundu. Bu değerlere sahip modellerle Spiro-OMeTAD molekülünün bant aralığı tahmini yapıldı. Literatür değeri 3.05 eV olan bant aralığı, SchNetPack ve ALIGNN modelleri tarafından sırasıyla 2.73 eV ve 2.52 olarak tahmin edildi. Daha sonra OMDB veri setine Kristalografi Açık Veritabanı'nda (Crystallography Open Database, COD) bulunan 10 adet delik geçiş malzemesi yapısı ve her bir yapı için 10 adet konformer olacak şekilde toplamda 100 adet yapı eklendi. Eklenen bu yapılarla beraber veri setine AugLiChem kütüphanesi ile yapay çoklama yöntemi uygulandı. Model performansını artırabilmek için yeni yapıların araştırılmasına devam edildi ve makale taramalarından 79 yeni yapı daha bulundu. Bu yapıların bant genişliği değerleri DFT metodu ile hesaplandıktan sonra veri setine eklendi. Yapılan bu işlemler neticesinde veri setinde toplamda 52835 yapı elde edildi. SchNetPack ve ALIGNN modelleri, yapay çoklanmış OMDB veri seti ve farklı parametrelerle birçok kez eğitildi. Bu eğitimler sonucunda en düşük MAE değerleri SchNetPack ve ALIGNN modelleri için MAE sırasıyla 0.23 eV ve 0.25 eV olarak bulundu. Bu değerlere sahip modellerle Spiro-OMeTAD molekülünün bant aralığı tahmini yapıldı. Literatür değeri 3.05 eV olan bant aralığı, SchNetPack ve ALIGNN modelleri tarafından sırasıyla 2.97 eV ve 2.82 olarak tahmin edildi. Yapılan yapay çoklama ve parametre değişiklikleri sonrasında modellerin MAE değerleri ortalama %40, bant aralığı değeri tahmin performansı ise ortalama %13 oranında artırılmıştır. Bant değeri tahmini yapılan diğer moleküller çalışma içerisinde gösterilmişti
Thermodynamic stability of binary compounds: A comprehensive computational and machine learning approach

(Graduate School, 2024-06-06) Canbaz, Feraye Hatice ; Tekin, Adem ; 702191018 ; Computational Science and Engineering

Exploration and exhaustive comprehension of novel materials are the main objectives of materials science. Laboratory evaluations have been the primary method by which substantial advancements have been achieved throughout the development of this scientific field. The contributions of density functional theory (DFT) algorithms have significantly altered the field of materials science over the past twenty years. These algorithms balance accuracy and efficiency. Supercomputers have enabled substantial breakthroughs in predicting electrical properties of crystal formations, facilitating a fundamental transition in the discipline. Developments of robust algorithms and lower computing costs have made data-driven approaches in materials research more widely adopted. Researchers can now analyze enormous datasets to guide experiments and uncover novel materials. Although databases are frequently used in contemporary materials science, there are some gaps regarding phonon calculations and the thermal properties of compounds. To address this deficiency, this thesis calculates the phonon stability, heat capacities at 298.15 K, formation enthalpies, formation entropies, and Gibbs free energies of binary structures. A total of 879 binary structures were examined, and the results of these calculations were compiled into a data set. In a recent study by my research team, the formation enthalpies and mechanical strengths of binary structures at absolute zero were investigated. This thesis contributes to this work by providing detailed analyses of the dynamic stability and thermodynamic properties of the same binary structures, supporting the findings of my team's prior research. In the initial phase of this thesis, the thermodynamic properties and phonon stabilities of the compounds were calculated. Subsequently, inspired by the PN-PN table model proposed and utilized in our team's recent work, this data set was mapped and visualized on a PN-PN table according to the periodic numbers (PN) assigned to the elements in the structures. This approach enabled the integrated visualization of phonon stability and other thermodynamic properties. Consequently, the chemical similarities between structures were more easily comprehended through the groups in the map, and the so-called forbidden regions were highlighted. Forbidden regions are regions in which specific pairings of elements are unable to form stable phases, which provides critical information on stability based on the PN numbers of the elements. The basic principle of the periodic numbering approach is as follows: First, periodic numbers (PN) are assigned to the elements with respect to their electronegativity, principal quantum number, and valence shell configuration, and then this numbering is extended to binary systems. This makes it easier to understand the chemical trends in the compounds formed by the elements and to predict phase formation. Although there are some exceptions in this mapping, it clearly shows the structures where phase formation is not expected. In our team's previous work, the PN-PN table significantly facilitated the identification of critical regions in different chemical systems and allowed for the analysis of trends in the chemical properties of equiatomic binary phases. Based on this, density functional theory-based thermodynamic calculations were performed in this thesis, providing thermodynamic data supporting the inferences of formation enthalpy and crystal structure stability calculated in our team's previous studies. A total of 879 structures' phonon stabilities were determined, and heat contribution values were calculated. Thus, the phonon stability and heat contribution data obtained from this thesis can be integrated with the mechanical strength properties of the structures from our team's previous findings. This allows for a more detailed interpretation of the relationship between phonon and mechanical stability. Additionally, using the elemental and structural properties of the compounds, machine learning techniques were applied to the current data set. Random Forest, Support Vector Machines (SVM), Gradient Boosting, and Decision Trees were assessed for their capacity to predict phonon stability. The Decision Tree model exhibited the highest performance, with an accuracy rate of 80\%. These models' accuracy was significantly enhanced by elemental descriptors such as band center, mean covalent radius, and mean electronegativity. The band center indicates the effect of the position in the electronic band structure on phonon stability, the mean covalent radius reflects the bonding properties of atoms, and the mean electronegativity determines the atoms' tendencies to attract electrons, thus affecting phonon stability. For predicting Gibbs free energy, Random Forest Regression, K-Nearest Neighbors (KNN) Regression, Support Vector Regression (SVR), and Linear Regression models were used. The performance of these models was evaluated using a 5-fold cross-validation method. The Random Forest Regression model exhibited the highest performance with an average score of 0.846. This result indicates that Random Forest Regression is the most effective model for predicting Gibbs free energy. These findings may encourage the broader application of machine learning techniques in future research. This significant step in understanding and modeling thermodynamic properties plays a critical role in optimizing material structures. In the future, it is expected that the methods of this study will be adapted and developed more specifically for certain material classes or other academic applications. This approach also serves as an efficient example of the discovery and design planning processes in materials science.
Visualization based analysis of gene networks using high dimensional model representation

(Graduate School, 2024-07-01) Güler, Pınar ; Tuna, Süha ; 702211009 ; Computational Science and Engineering

Genetic studies have revolutionized our understanding of the biological mechanisms underlying health and disease. By exploring the intricate details of the human genome, researchers can identify genetic variations that contribute to various phenotypic outcomes. One of the key advancements in this field is gene network analysis, which examines the complex interactions between genes and how they regulate cellular processes. This approach provides a comprehensive view of the biological systems and uncovers the pathways involved in disease mechanisms. Genome-Wide Association Studies (GWAS) play a pivotal role among the methodologies utilized in gene network analysis. GWAS involves scanning the genome for slight variations, known as single nucleotide polymorphisms (SNPs), that occur more frequently in individuals with a particular disease or trait than in those without. By identifying these associations, GWAS helps pinpoint genetic factors contributing to disease susceptibility and progression, paving the way for personalized medicine and targeted therapeutic strategies. By integrating various variant analysis techniques, researchers can develop a deeper understanding of the genetic architecture of diseases, leading to significant advancements in diagnostics, treatment, and prevention. Gene network and pathway analyses are essential components of genetic studies, offering insights into genes' complex interactions and functions within a biological systems. However, both face significant computational challenges, mainly when dealing with high-dimensional genomic data. Analyzing vast datasets containing gene expression profiles and genetic variations demands sophisticated computational methods capable of handling their scale and complexity. Conventional statistical methods frequently require assistance to become effective, demanding complex computational approaches like data visualization, network modeling, and machine learning algorithms. In addition, the complexity of biological networks and pathways makes analysis even more complicated, necessitating the use of powerful computational tools to interpret regulatory mechanisms and simulate complex biological processes correctly. Overcoming these challenges is crucial for gaining deeper insights into gene networks and pathways, thereby advancing our understanding of their roles in health and disease. In pathway analysis, scientists employ data collected from many sources, such as Genome-Wide Association Studies (GWAS), to identify target genes and connect them to known pathways using Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. However, pathway analysis presents major computing challenges, especially when large, high-dimensional genomic datasets are involved. Researchers have developed innovative methods such as High Dimensional Model Representation (HDMR), Chaos Game Representation (CGR), and visual analysis of DNA sequences based on a variant logic construction method called VARCH to overcome these challenges. By mapping genetic sequences into visual representations, these innovative approaches can help identify potential genetic markers and better understand biological processes. These computational methods must be included in gene network and pathway investigations to fully understand the complex architecture of genetic interactions and how they affect health and diseases. In this thesis, we harnessed three sophisticated computational methodologies: Chaos Game Representation, visual analysis of DNA sequences based on variant logic construction called VARCH, and High Dimensional Model Representation, each offering unique contributions to the variant analysis, respectively CGR, a prevalent technique in bioinformatics, translates genetic sequences into visually interpretable diagrams, clarifying complex structures and patterns in the sequences. On the other hand, VARCH converts sequences into a feature space, successfully capturing each aspect of their complexity and uncertainty. These techniques are effective instruments in our search for potential genetic markers that might help us distinguish between the patient and control groups in our investigation. Furthermore, we utilized HDMR for dimension reduction, an essential technique for simplifying the complex structure in high-dimensional genomic data. By condensing data dimensions, HDMR facilitated more efficient and accurate classification, enabling us to uncover sensitive genetic relationships and patterns that might have remained hidden otherwise. Integrating these computational techniques provided robust solutions for analyzing genetic data from the mTOR pathway, enriching our comprehension of the genetic mechanisms supporting various phenotypic outcomes. In our study, we begin on a mission to deepen our comprehension of the intricate genetic patterns intertwined with diverse phenotypic outcomes. Focusing on genetic data sourced from the mTOR pathway, we leveraged state-of-the-art computational methodologies to unravel hidden insights. Our primary objective was to assess the efficacy of CGR, VARCH, and HDMR in gene network analyses. As we analyzed the data, the results were quite compelling. Both CGR and VARCH methods demonstrated notable accuracy in genetic classification, with VARCH exhibiting a significant edge over CGR in terms of accuracy and sensitivity metrics. This superiority was underscored by VARCH's ability to considerably minimize binary cross-entropy (BCE) loss values, demonstrating the ability to reduce errors in predictions. However, we examined the computing overheads associated with each methodology in detail, providing insight into the challenging trade-off between computational complexity and accuracy. Despite the more significant parameters, VARCH's computational requirements were apparent, although its performance was better than CGR's. Our study demonstrates the potential of computational tools for unraveling gene complexities while also acting as an essential reminder of how crucial it is to overcome the complex environment of computational constraints carefully, helping researchers search for the best possible method selection and optimization.

Gözat

Başlık ile LEE- Hesaplamalı Bilim ve Mühendislik-Yüksek Lisans'a göz atma

Sayfa başına sonuç

Sıralama Seçenekleri