Machine learning assisted massively parallel crystal structure prediction

Demir, Samet
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Graduate School
Crystal structure prediction (CSP) is of utmost significance in the process of computationally screening materials. Despite the fact that many methods have been developed to date, new methods must be developed or existing methods must be enhanced to keep pace with advancing technology. The presence of numerous local minima in CSP problems necessitates the use of global optimization techniques. Density Functional Theory (DFT) can be used as the objective function for CSP global optimization methods. However, this method is costly as it significantly increases the computation time. For this purpose, force fields are used as an objective function, which can be quite successful, especially for molecular crystals. CASPESA (CrystAl Structure PrEdiction via Simulated Annealing), a program developed by our working group many years ago, has been successfully used and reported for modeling covalent crystals. Within the scope of this thesis, a flexible, easy-to-use and speedy software that can work for both colvalent and molecular crystals, FFCASP (Fast and Flexible CrystAl Structure Predictor), has been developed. Many properties of molecules can be determined simply by observing its structure. Moreover, the CSP problem is still not entirely resolved. Even though there are already a great number of algorithms for CSP accessible in the literature, the majority of them operate according to completely distinct principles. Many of these algorithms rely on DFT in the objective function, which limits their applicability to small molecules. The algorithm devised in this thesis employs knowledge-based or force-field-based objective functions. In one of the implementations presented in this thesis, a structure with more than 200 atoms per unit cell was accurately predicted. This accomplishment is a distinct indication of objective realization; building a very fast and effective algorithm. Another important and unique approach is the molecule movement system. There are 230 possible symmetries for crystal structures and considering that more than 99% of the molecules in the databases has symmetry, it is clear that the motion system to be created must be very successful in producing symmetrical structures. Many CSP algorithms choose a fixed symmetry value and make predictions only in that symmetry, thus reducing the number of parameters to be optimized, while only covering the symmetry values frequently encountered in the literature. In the algorithm created within this thesis, it is aimed to cover all symmetry values with 14 different prediction sets by dividing 230 symmetries into 14 prototypes. In order to improve the CSP algorithm, global optimizer was first developed to be used in the algorithm. Simulated Annealing (SA), whose success in CSP is known from our previous work, was revisited and modernized to a better level. During the modernization, the existing Fortran 77 code was rewritten using Modern Fortran features, parallelization was done to make it more successful on multi-core computers, a new decision mechanism was developed for more intelligent lowering and raising of the simulated temperature, periodic parameters were defined and a better reporting system was developed. The biggest shortcoming of the system is that when working with large structures (8 or more formula units) the initial barriers are difficult to overcome. Particle Swarm Optimization (PSO), which is known to overcome such large barriers well, was studied and a new algorithm was developed to work before the SA algorithm so that large potential barriers could be overcome more easily. This novel hybrid optimization algorithm is written in a manner that makes it applicable to a wide variety of optimization problems, not just CSP. The optimization algorithm is combined with the motion and unit cell rules for CSP to create a prototype crystal structure prediction algorithm. The most important part of the algorithm that has not yet been mentioned up to this point is the objective function. Two different methods were developed for this purpose. One method is based on the knowledge which can be gathered from literature or DFT calculations and the other one one is based on the use of force fields. In addition, a highly capable post-processing and pre-processing tool called FFCASP Tools has been developed. This utility simplifies the use of FFCASP. FFCASP Tools plays a crucial role in each of the applications discussed in this thesis. Numerous CSP applications were executed utilizing FFCASP and FFCASP Tools, and the outcomes of these studies demonstrated that all of our goals have been realized with great success. In the scope of this thesis, a comprehensive analysis of six different applications were provided alongside with the technical details of the algorithm.
Thesis(Ph.D.) -- Istanbul Technical University, Graduate School, 2023
Anahtar kelimeler
machine learning, makine öğrenimi, crystal structure, kristal yapı