LEE- Hesaplamalı Bilim ve Mühendislik-Yüksek Lisans

Bu koleksiyon için kalıcı URI

http://hdl.handle.net/11527/19296

Gözat

Şimdi gösteriliyor 1 - 5 / 10

Doğal dil ile SQL ve görselleştirme koduna dönüşümde büyük dil modellerinin karşılaştırmalı analizi

(İTÜ Lisansüstü Eğitim Enstitüsü, 2025) Uçar, Baykal Mehmet ; Baday, Sefer ; 702091013 ; Hesaplamalı Bilim ve Mühendislik

Veriye erişimin kolaylaştırılması ve teknik olmayan kullanıcılar için veri tabanlarıyla etkileşimin demokratikleştirilmesi, günümüz bilgi teknolojileri açısından büyük önem arz etmektedir. Bu bağlamda, doğal dil sorgularını yapılandırılmış SQL ifadelerine ve görsel Python kodlarına dönüştürebilen büyük dil modelleri (LLM – Large Language Models), geleneksel veri analizi süreçlerine önemli katkılar sunmaktadır. Bu tez çalışmasında, doğal dil ile veri sorgulama ve görselleştirme süreçlerini uçtan uca gerçekleştiren bir yapay zekâ destekli sistem tasarlanmış ve farklı büyük dil modellerinin bu sistem üzerindeki performansları karşılaştırmalı olarak analiz edilmiştir. Çalışma kapsamında OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, Meta LLaMA 3.1 70B, Google Gemini 1.5 Flash ve DeepSeek Chat modelleri değerlendirilmiştir. Microsoft'un açık kaynaklı Semantic Kernel yazılım geliştirme kiti kullanılarak doğal dil sorgularının SQL ve Python kodlarına dönüştürüldüğü modüler ve genişletilebilir bir sistem mimarisi geliştirilmiştir. Sistem, kullanıcıdan gelen doğal dil girdisini şema açıklaması ile birlikte işleyerek çalıştırılabilir SQL ve Python kodları üretmekte, ardından bu kodları çalıştırarak sonuçları kullanıcıya sunmaktadır. Model başarımı hem insan değerlendirmesiyle hem de en başarılı modelin hakem olarak kullanıldığı LLM-tabanlı otomatik değerlendirme ile ölçülmüştür. İnsan değerlendirmesine göre Claude 3.5 Sonnet en yüksek doğrulukla çalışırken, GPT-4o ve Gemini Flash da özellikle SQL üretiminde başarılı sonuçlar vermiştir. LLM değerlendirmesinde ise Python görselleştirme kodlarının kalite farkları daha belirgin hale gelmiştir. LLaMA ve DeepSeek modelleri SQL çıktılarında rekabetçi sonuçlar sunarken, Python kod üretiminde daha düşük skorlar almıştır. Bu tez çalışması, farklı büyük dil modellerinin metinden koda dönüşüm yeteneklerini kapsamlı biçimde karşılaştırarak, model seçiminde ve sistem mimarisi kurulumunda yol gösterici olmayı amaçlamaktadır. Ayrıca Semantic Kernel tabanlı yaklaşım, yeni modellerin hızlı entegrasyonuna olanak sağlayan esnek bir altyapı sunmakta ve bu yönüyle sürekli gelişen LLM ekosistemine uyumlu bir çözüm önermektedir.
Design of short peptides targeting the interaction between SARS-CoV-2 and human ACE2

(Graduate School, 2023) Usta, Numan Nusret ; Baday, Sefer ; 702171021 ; Computational Science and Engineering Programme

In the end of 2019, a novel coronavirus called SARS-CoV-2 has appeared in Wuhan, China and caused a pandemic outbreak by overwhelming deadly infections around the work. By comparing to its ancestor, SARS-CoV, it is more infectious and shows much more tight binding with host cells. One of the main units of this coronavirus is spike (S) protein which has two subunits called S1 and S2. Subunit S1 contains receptor binding domain (RBD) that takes role in initiating entry into the cells by interacting with human angiotensin converting enzyme 2 (ACE2) receptor. And S2 subunit ensures the fusion between host and viral cell membranes in meantime. Different drug development researches were studied and several therapeutics developed during the pandemic time such as antibodies and vaccines. Differing from traditional drug developments methods, peptides as inhibitor are promising drug compound due to their efficiency, lesser immunogenicity, easiness of removal from body and they have higher diffusivity through tissues and organs because of their smallness. Considering these advantages of peptides, several studies have been made with knowing antiviral peptides by different research groups. In present study, what we aimed to develop novel peptides to use against spike RBD. Unlike by using known peptides from the literature, we try to use as much possible as combination of short peptides. First, we created peptides of random sequences, after that we docked them to spike RBD protein by using AutoDock CrankPep (ADCP). For deciding to use the proper docking tool, we have done a comparison work between free docking tools (Vina and ADCP) which are developed by AutoDock Software. After finishing the dockings, top ones with highest binding energy result are selected for the next step which is Molecular Dynamics (MD) simulations. Results of simulations are controlled according to their RMSD trajectory and binding energy. At the end, one sequence, "wfdwef", stood out as promising over the others. For further, this finding can be used as potential inhibitor for coronavirus after experimental studies
Thermodynamic stability of binary compounds: A comprehensive computational and machine learning approach

(Graduate School, 2024-06-06) Canbaz, Feraye Hatice ; Tekin, Adem ; 702191018 ; Computational Science and Engineering

Exploration and exhaustive comprehension of novel materials are the main objectives of materials science. Laboratory evaluations have been the primary method by which substantial advancements have been achieved throughout the development of this scientific field. The contributions of density functional theory (DFT) algorithms have significantly altered the field of materials science over the past twenty years. These algorithms balance accuracy and efficiency. Supercomputers have enabled substantial breakthroughs in predicting electrical properties of crystal formations, facilitating a fundamental transition in the discipline. Developments of robust algorithms and lower computing costs have made data-driven approaches in materials research more widely adopted. Researchers can now analyze enormous datasets to guide experiments and uncover novel materials. Although databases are frequently used in contemporary materials science, there are some gaps regarding phonon calculations and the thermal properties of compounds. To address this deficiency, this thesis calculates the phonon stability, heat capacities at 298.15 K, formation enthalpies, formation entropies, and Gibbs free energies of binary structures. A total of 879 binary structures were examined, and the results of these calculations were compiled into a data set. In a recent study by my research team, the formation enthalpies and mechanical strengths of binary structures at absolute zero were investigated. This thesis contributes to this work by providing detailed analyses of the dynamic stability and thermodynamic properties of the same binary structures, supporting the findings of my team's prior research. In the initial phase of this thesis, the thermodynamic properties and phonon stabilities of the compounds were calculated. Subsequently, inspired by the PN-PN table model proposed and utilized in our team's recent work, this data set was mapped and visualized on a PN-PN table according to the periodic numbers (PN) assigned to the elements in the structures. This approach enabled the integrated visualization of phonon stability and other thermodynamic properties. Consequently, the chemical similarities between structures were more easily comprehended through the groups in the map, and the so-called forbidden regions were highlighted. Forbidden regions are regions in which specific pairings of elements are unable to form stable phases, which provides critical information on stability based on the PN numbers of the elements. The basic principle of the periodic numbering approach is as follows: First, periodic numbers (PN) are assigned to the elements with respect to their electronegativity, principal quantum number, and valence shell configuration, and then this numbering is extended to binary systems. This makes it easier to understand the chemical trends in the compounds formed by the elements and to predict phase formation. Although there are some exceptions in this mapping, it clearly shows the structures where phase formation is not expected. In our team's previous work, the PN-PN table significantly facilitated the identification of critical regions in different chemical systems and allowed for the analysis of trends in the chemical properties of equiatomic binary phases. Based on this, density functional theory-based thermodynamic calculations were performed in this thesis, providing thermodynamic data supporting the inferences of formation enthalpy and crystal structure stability calculated in our team's previous studies. A total of 879 structures' phonon stabilities were determined, and heat contribution values were calculated. Thus, the phonon stability and heat contribution data obtained from this thesis can be integrated with the mechanical strength properties of the structures from our team's previous findings. This allows for a more detailed interpretation of the relationship between phonon and mechanical stability. Additionally, using the elemental and structural properties of the compounds, machine learning techniques were applied to the current data set. Random Forest, Support Vector Machines (SVM), Gradient Boosting, and Decision Trees were assessed for their capacity to predict phonon stability. The Decision Tree model exhibited the highest performance, with an accuracy rate of 80\%. These models' accuracy was significantly enhanced by elemental descriptors such as band center, mean covalent radius, and mean electronegativity. The band center indicates the effect of the position in the electronic band structure on phonon stability, the mean covalent radius reflects the bonding properties of atoms, and the mean electronegativity determines the atoms' tendencies to attract electrons, thus affecting phonon stability. For predicting Gibbs free energy, Random Forest Regression, K-Nearest Neighbors (KNN) Regression, Support Vector Regression (SVR), and Linear Regression models were used. The performance of these models was evaluated using a 5-fold cross-validation method. The Random Forest Regression model exhibited the highest performance with an average score of 0.846. This result indicates that Random Forest Regression is the most effective model for predicting Gibbs free energy. These findings may encourage the broader application of machine learning techniques in future research. This significant step in understanding and modeling thermodynamic properties plays a critical role in optimizing material structures. In the future, it is expected that the methods of this study will be adapted and developed more specifically for certain material classes or other academic applications. This approach also serves as an efficient example of the discovery and design planning processes in materials science.
Optimization of mixing efficiency of air and hydrogen in scramjet combustor

(Graduate School, 2025-01-22) Şasıcı, Mustafa ; Edis, Fırat Oğuz ; 702201008 ; Computational Science and Engineering

This thesis investigates the optimization of a scramjet combustor to enhance mixing efficiency and total pressure recovery (TPR) through a combination of computational fluid dynamics (CFD) simulations and Bayesian optimization (BO). Scramjet engines, operating at hypersonic speeds, rely on efficient fuel-air mixing in a very short timeframe to achieve effective combustion. Improving mixing efficiency is critical for ensuring stable combustion and maximizing thrust, while maintaining TPR is essential to harness the energy of the flow without incurring excessive pressure losses. A two-dimensional CFD model of a scramjet combustor, developed using OpenFOAM's reactingFoam solver with chemical reactions disabled, forms the core of this investigation. The model incorporates a k–ω SST turbulence model to accurately capture complex flow phenomena, including shock-wave/boundary-layer interactions and turbulent shear layers. The combustor geometry is based on a DLR configuration and is systematically varied by changing key geometric parameters: wedge angle, distance between injectors, and injection angle. These parameters influence the flow structure, jet penetration, and turbulence intensity, ultimately affecting both mixing efficiency and TPR. Bayesian optimization is employed to identify the optimal combination of parameters. A Gaussian Process (GP) surrogate model approximates the objective function, defined as a weighted sum of mixing efficiency and TPR. The optimization process begins with an initial set of samples selected systematically (without employing previously assumed sampling methods), ensuring a broad exploration of the parameter space. The GP surrogate is iteratively updated as new CFD evaluations are performed, guiding the search toward promising regions that balance exploration and exploitation. Results demonstrate that carefully chosen parameters can significantly improve mixing efficiency and achieve a favorable compromise with TPR. The optimal configuration identified through this process enhances fuel-air interaction, resulting in more uniform distribution of the hydrogen mass fraction at downstream locations. Ultimately, this study provides valuable insights into the complex interplay between geometric design and aerodynamic performance in scramjet combustors, offering a robust methodology to guide future hypersonic propulsion system development.
Exploiting optimal supports in enhanced multivariance products representation for lossy compression of hyperspectral images

(Graduate School, 2024-01-31) Şen, Muhammed Enis ; Tuna, Süha ; 702211008 ; Computational Science and Engineering

Data serves as an irreplacable foundation of modern society as it is the core element of numerous fields such as technological innovations, scientific advancements, and economic decisions. It enables insights into domains of knowledge and experience, assistance with decision-making tasks, and predictions for future outcomes. It has progressed since the very beginning of time from being knowledge and information kept in physical formats like carvings on cave walls and events conserved as inscriptions, to evolving into such mathematical structures that can be obtained by any interaction made through current technological devices like interactions on social media and observations acquired through the use of advanced tools owing to technological advancements. Data transforming into more detailed and complex structures poses efficiency challenges that entails computational methods which can handle the processing and handling of such structures. For the data handling needs, many methods have been proposed and have been in use thus far. Each has its advantages as well as some drawbacks in the form of problems in either certain limitations or computational complexity issues. Some alternative workarounds have been suggested for these kinds of issues such as embracing an iterative approach rather than employing direct solutions and techniques have been customized to fit specific workflows. Moreover, some innovative approaches operate on representations of data that have undergone compression and transformation, rendering them into more easily processable structures. An important aspect of these practices is the preservation of the data's characteristic features. Compression methods execute this procedure in unique ways like exploiting the eigenvalues and eigenvectors or utilizing singular values. These techniques not only streamline the processing of data but also contribute to the efficiency and accuracy of analyses by retaining characteristic features throughout the compression process. In the field of data processing, an understanding of these diverse methodologies proves convenience in selecting the most effective solutions for the application under consideration. Hyperspectral imaging is an area that requires such computational techniques to process the collected data due to its high dimensional workflow. It outputs 3-dimensional mathematical structures where the first two dimensions correspond to the spatial attributes of the captured area while the third dimension captures the spectral information with respect to the obtaining device's capacity of retrieving bands. As a result, the fibers in the data's third dimension relate to spectral signatures that empower the identification of objects and materials. The ability to analyze these spectral data opens doors to multiple useful applications in numerous areas like remote sensing, agriculture, medical imaging, archaeology, and urban planning. Recent studies in computational sciences for high-dimensional structures have adopted new methods that improve the overall processing performance and make more in-depth analyses possible. Considering the relational design in its third dimension, the High Dimensional Model Representation (HDMR) is a technique that hyperspectral imaging can benefit deeply thanks to its decorrelation properties. The aim of HDMR is to represent multivariate functions in terms of lower dimensional ones. But thanks to the way it was defined, this technique is also applicable on tensors, hence, it can be used to decompose a given tensor in terms of less dimensional entities where each element refers to the attitude of a certain combination of dimensions. This ability of HDMR addresses the decorrelation of each dimension of the given data. The decorrelation procedure enables reducing the noise and removing artifacts while preserving the high-frequency components. Hence, it can be said that HDMR is a suitable compression technique for high-dimensional data with strong relations on individual axes such as hyperspectral images. HDMR employs a set of weights and support vectors to represent data, consequently, necessitating calculation steps. These entities are either assigned certain values or arranged using techniques like Averaged Directional Supports (ADS) but the process of calculating the optimal entities can also be optimized by employing iterative methods such as the Alternating Direction Method of Multipliers (ADMM) where the entailments of HDMR could be used as constraints of ADMM. A sub-method of HDMR which is called the Enhanced Multivariance Products Representation (EMPR) specializes in optimizing the representation by focusing on the support vectors. The weights are assumed to be constant values or scalars and the support vectors are managed by the previously mentioned calculation techniques. As these methods employ the main data for the calculation of the support vectors, they introduce a more robust method EMPR compared to HDMR. Iterative approaches like ADMM can assist in properties of these support vectors such as enforcing sparsity for better representions and improving denoising capabilities. This thesis work explores the hyperspectral imaging area and proposes a new perspective on the decomposition methods by bringing a tensor-based iterative approach to EMPR through the use of ADMM. The study compares the proposed method's performance and efficiency with some other well-known tensor decomposition techniques, namely CANDECOMP/PARAFAC Alternating Least Squares (CP-ALS) and Tucker Decomposition (TD), while also comparing the results to EMPR's regular application by ADS. Multiple tests are performed on hyperspectral datasets which are 3-dimensional and as a result, the proposed technique is arranged to be applicable on any 3-dimensional tensor especially data that can benefit the decorrelation properties of EMPR. As a result of EMPR, the relations in each dimension and the combinations of these dimensions are acquired through the support vectors. Results from multiple metrics prove that the proposed method performs similarly to the mentioned tensor decomposition methods for specified ranks and the decorrelated dimensions are successfully represented by the 1-dimensional EMPR components. Tests also employ the 2-dimensional components to reveal the effect on final representations with comparisons to CP-ALS and TD aiming for multiple rank options. The key point of this proposed technique lies in EMPR's superior decorrelation ability. Not only does it demonstrate the capability of reconstructing high-dimensional data with similar accuracy but it also highlights its potential to reduce noise and artefacts in the process. These results are particularly promising for any lossy compression task including Cartesian geometry utilizing tensor decomposition techniques where accurate and efficient data processing is paramount. Furthermore, this performance advantage paves the way for advancements in lossy compression techniques, enabling researchers and practitioners to gain more precise insights from data.

Gözat

Son Başvurular