Memory-based approaches to problems in probabilistic modeling

Akgül, Abdullah
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Lisansüstü Eğitim Enstitüsü
Deep neural networks are an accepted solution for many problems in deep learning; however, the application of deep neural networks to safety-critical areas such as health care is still a hot research topic. To employ deep neural networks in such fields, they are expected to fit the in-domain data set, provide calibrated predictions on problematic regions of the target domain, and separate the out-of-domain queries. Even though these expectancies are studied extensively, these studies are highly fragmented. Therefore, there is no model that is able to fit these requirements simultaneously. Continual Learning (CL) is a framework that aims to learn numerous tasks in a sequential way. The excellent CL method should adapt to new tasks perfectly without forgetting previous tasks. However, neural networks suffer from catastrophic forgetting which is a performance drop on previously learned tasks caused by the newly learned task. Yet, to get intelligent systems capable of adapting to environmental change, CL is crucial. Because of this, CL is a hot topic but the research on CL is mainly on image classification tasks and there is limited work on time sequence classification tasks. Yet, there is no work on multi-modal dynamics modeling. In this thesis, we employ an external memory to deal with problems in probabilistic modeling. Our solutions for these problems can be summarized as follows: i) Evidential Turing Processes (ETP): First, we define total calibration for the first time. After investigating two Bayesian paradigms which are the Bayesian model, and the Evidential Bayesian Model, we introduce the Complete Bayesian Model (CBM) which is a unification of those two paradigms. We develop ETP as an instance of CBMs with neural episodic memory. We build a pipeline to evaluate the models' performance for total calibration. We compare our solution, the ETP member of CBMs, with state-of-the-art members of other paradigms, and we also provide an ablation study. We investigate the models' performance under five real-world data sets including one time-series classification, and four image classification tasks. Furthermore, we tested the models in the corrupted versions of different data sets. We use four different metrics that are test error as prediction accuracy, Expected Calibration Error as in-domain calibration score, Negative Log-Likelihood (NLL) as model fit, and area under the ROC curve as out-of-domain detection score. We report that only the ETP can excel in all three aspects of total calibration simultaneously. ii) Continual Dynamic Dirichlet Process (CDDP) for Continual Learning of Multi-modal Dynamics: We introduce a new problem which is CL of multi-modal dynamics. Since the problem is novel, we create a baseline from the existing ones. For this new problem, we introduce a novel solution that employs an external memory to transfer knowledge between tasks. We curate a pipeline for this newly introduced problem, and in the pipeline new tasks are coming sequentially and each task has a certain number of different mode samples. Differences in task order may cause different results in CL setups; therefore, we change the task order for each replication. We also generate synthetic data sets and adapt time-series classification data sets to evaluate models' performance in the problem. We compare models' performance with Normalized Mean Squared Error as a measure of prediction accuracy and NLL as a measure of Bayesian model fit that quantifies uncertainty. We reveal that our approach, CDDP, compares favorably to the established parameter transfer approach in CL of multi-modal dynamical systems. To sum up, in this thesis, by experiments we show that external memory architecture can be used for both calibrations of neural networks to use in safety-critical areas and CL of multi-modal dynamics.
Anahtar kelimeler
deep learning, derin öğrenme, artificial intellligence, yapay zeka, dynamic modelling, dinamik modelleme, machine learning, makine öğrenmesi