Machine learning assisted force field development for nucleic acids
Machine learning assisted force field development for nucleic acids
Dosyalar
Tarih
2024-06-14
Yazarlar
Demir İniş, Gözde
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
Molecular dynamics and mechanics simulations have long been used as an effective tool in the investigation of biomolecular systems. These simulations are founded upon force fields (FF), which are empirically parametrized sets of equations. These sets of equations allow us to associate the configurations of interacting atoms and molecules in the simulated system with their energies. To perform this association, FFs use classical physics to derive the forces governing the dynamics of these systems rather than quantum mechanical (QM) calculations, which are computationally quite time-consuming and costly for large systems. By this means, they may assist us in gaining deeper insights into many systems, from small drug-like compounds to proteins, nucleic acid bases, and even viruses. Another area of use of FFs is crystal structure prediction (CSP). Although Density Functional Theory (DFT) can be used as an objective function in global optimization approaches for CSP, it is not feasible considering its computational demand. In this regard, using FFs as objective functions is a reasonable approach in terms of both cost and performance. Through this approach, the individual FFs generated in our research group and the well-known GAFF potential have been integrated into the in-house developed program (FFCASP), which has the capacity to perform both two-dimensional (2-D) and three-dimensional (3-D) CSP, and several studies have been carried out in which highly successful results have been obtained for various systems. There are numerous FFs in the literature, and each has its own strengths and weaknesses. Even the most popular of these FFs have been found to fall short of accurately describing some particular interactions. These interactions (such as base pairing, $\pi-\pi$ stacking, and hydrogen bonding) are non-covalent interactions and are of great importance for proteins and nucleic acids (deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)), since they are the stabilizing factors of these complexes. Among these complexes, DNA can be considered the most significant biological molecule, as it contains the genetic information of all known living beings. A profound understanding of the interactions between DNA bases and solid surfaces is vital for an array of possible applications in the fields of biophysics, medicine, materials science, and nanotechnology. Biocompatible materials, biosensors, drug delivery systems, and organic semiconductors can be listed at the top of the application's list. Within the scope of this thesis study, in order to fill the aforementioned gap in the literature, a force field, namely NICE-FF (Non-empirical, Intermolecular, Consistent, and Extensible FF), with higher accuracy and relatively low computational cost, has been generated specifically for the DNA bases. In this respect, an automated framework has been developed to produce well-polished computational grids, carry out ab-initio calculations, perform FF parametrization, and even expand the existing set of parameters. With this genuine parametrization approach, which is analogous to machine learning (ML) techniques, we were able to fit the calculated SCS-MI-MP2 interaction energies (IEs) to the Buffered 14–7 potential function. The first set of parameters has been obtained using the generated data sets of all ten DNA base combinations. Afterwards, the NICE-FF parameter set was integrated into the FFCASP, and the performance of the FF was tested by performing CSPs on a series of organic molecules. First, CSP tests are conducted on four DNA bases and two other molecules (namely, hypoxanthine and uracil) that contain only the same atom types as the DNA bases. Once we were satisfied with the validity of our parameter set, we introduced two more molecules (pyrazinamide and 9-methylhypoxanthine), mostly consisting of DNA base atom types, along with a few other new ones, to broaden the range of NICE-FF. Following the determination of the new atom type parameters, we proceeded to perform CSP for these two compounds. Additionally, we performed CSP on our independent test case, theophylline, which contains the newly parametrized atom types, in order to validate the reliability of our extended parameter set. For all of these test cases mentioned above, a total of over one hundred thousand predictions were made. It has been revealed that they yielded quite successful results, and we were able to locate almost all the known experimental CSs of the considered molecules. Finally, with the help of CrystalCMP software, we quantitatively compared the molecular packing of all the above-mentioned NICE-FF-predicted crystals and their corresponding experimental structures. The low $RMSD_{20}$ values obtained for almost all test cases indicated that the molecular packing quality of the reproduced structures is quite high. Other than these tests, we performed benchmarking on the renowned S22 data set against the five popular FFs and high-level ab-initio IEs to evaluate the performance of NICE-FF. The results revealed that NICE-FF outperforms the widely recognized FFs by providing the most compatible IEs with the reference method (CCSD(T)). Polymorph studies that we conducted using our new FF showed that even with this limited parameter set, quite promising results can be obtained. Moreover, it is evident that NICE-FF has the potential to easily cover new organic molecules. By pursuing this direction, it is possible to develop a more generalized version of NICE-FF, which can cover a wider range of possible atom types of organic molecules. In this ongoing polymorph study, CSPs were carried out for four DNA bases (when $Z =$ 2, 4, and 8), and subsequent DFT calculations were completed for the selected promising structures. The lattice energies have been calculated in order to make it easier to compare the structures with varying $Z$. The last study reported in the thesis is independent from the subject of FFs. In this study, machine learning techniques were employed to predict the band gap and formation energy of dual-cation organic-inorganic hybrid perovskites.
Açıklama
Thesis (Ph.D.) -- Istanbul Technical University, Graduate School, 2024
Anahtar kelimeler
Computational chemistry,
Hesaplamalı kimya,
Machine learning,
Makine öğrenmesi,
Molecular crystals,
Moleküler kristaller