LEE- Bilgisayar Mühendisliği-Yüksek Lisans

Bu koleksiyon için kalıcı URI

http://hdl.handle.net/11527/19207

Gözat

3D face animation generation from audio using convolutional neural networks

(Graduate School, 2022) Ünlü, Türker ; Sarıel, Sanem ; 504171557 ; Computer Engineering Programme

Problem of generating facial animations is an important phase of creating an artificial character in video games, animated movies, or virtual reality applications. This is mostly done manually by 3D artists, matching face model movements for each speech of the character. Recent advancements in deep learning methods have made automated facial animation possible, and this research field has gained some attention. There are two main variants of the automated facial animation problem: generating animation in 2D or in 3D space. The systems that work on the former problem work on images, either generating them from scratch or modifying the existing image to make it compatible with the given audio input. The second type of systems works on 3D face models. These 3D models can be directly represented by a set of points or parameterized versions of these points in the 3D space. In this study, 3D facial animation is targeted. One of the main goals of this study is to develop a method that can generate 3D facial animation from speech only, without requiring manual intervention from a 3D artist. In the developed method, a 3D face model is represented by Facial Action Coding System (FACS) parameters, called action units. Action units are movements of one or more muscles on the face. By using a single action unit or a combination of different action units, most of the facial expressions can be presented. For this study, a dataset of 37 minutes of recording is created. This dataset consists of speech recordings, and corresponding FACS parameters for each timestep. An artificial neural network (ANN) architecture is used to predict FACS parameters from the input speech signal. This architecture includes convolutional layers and transformer layers. The outputs of the proposed solution are evaluated on a user study by showing the results of different recordings. It has been seen that the system is able to generate animations that can be used in video games and virtual reality applications even for novel speakers it is not trained for. Furthermore, it is very easy to generate facial animations after the system is trained. But an important drawback of the system is that the generated facial animations may lack accuracy in the mouth/lip movement that is required for the input speech.
A deep learning architecture for missing metabolite concentration prediction

(Graduate School, 2024-07-12) Çelik, Sadi ; Çakmak, Ali ; 504211530 ; Computer Engineering

In the last decade, the use of deep learning methods for the diagnosis and treatment of diseases has become a widespread practice in the field of bioinformatics. Metabolomics is an omics science dealing with the identification and measurement of all metabolites in an organism, and can provide a comprehensive analysis of the metabolic profile both in physiological and pathological conditions. Metabolomics data serve as a measure of metabolic function. In particular, relative ratios and perturbations that are outside of the normal range signify disease conditions. The analysis workflow of metabolomics data involves the application of different bioinformatics tools. The quantification of metabolites is accomplished by the utilization of a wide range of different combinations of mass spectrometry (MS) in conjunction with liquid chromatography (LC), gas chromatography (GC), and nuclear magnetic resonance (NMR) techniques. For a proper biological interpretation of metabolomic datasets and powerful data analysis, preprocessing is essential to ensure high data quality. In various studies containing metabolite measurements, missing values in the data may affect the performance of the analysis results significantly. In recent years, the application of deep learning-based generative models for the accurate imputation of missing values has gained popularity. Unsupervised generative models like variational autoencoders (VAE) can impute missing values to perform more powerful data analysis. This work aims to develop effective models that can accurately predict missing metabolite values in metabolomics datasets. To this end, a number of human metabolomics studies from the Metabolomics Workbench and MetaboLights databases are collected. These datasets are heterogeneous in terms of their metabolite sets and the underlying experimental technologies that generated them. Hence, it is challenging to utilize these diverse datasets together to train imputation models. To tackle this challenge, we propose three different models and dataset merging strategies, namely, Union-based Merging, Iterative Similarity-based Merging, and Model-guided Agglomerative Merging. We perform several experiments to determine the optimal setup configuration for the training pipeline. This includes finding the best initial missing value imputation approach and the most effective data pretreatment scheme. After handling the original missing values and applying a preprocessing pipeline to the input data, k-fold cross-validation is carried out to ensure consistent and reliable model evaluation. Before training, random missingness simulations are performed to mimic different missing value patterns in clinical datasets, and the models are trained with those patterns. During our empirical evaluation, we observe that the complexity drawback of IterativeImputer with RandomForestRegressor is more evident in larger datasets. For this reason, KNNImputer method is chosen as the standard missing filling method for initial missing values in our proposed merging approaches. Moreover, the application of log transformations with different bases and the Yeo-Johnson transformation of the datasets results in improved VAE model performances. Furthermore, our experimental results show that the performance of the proposed framework scales over large datasets to create accurate metabolite- and dataset-independent imputation models to predict missing values in metabolomics studies.
Failure prevention in robot manipulation using adversarial reinforcement learning

(Graduate School, 2023) Kutay, Mert Can ; Sarıel, Sanem ; 815451 ; Computer Engineering Programme

Robotic manipulation is an important area of research with applications in fields such as manufacturing, agriculture, and healthcare. However, failure scenarios in these tasks can have costly and sometimes dangerous consequences. To address this challenge, there has been a growing interest in developing failure prevention policies that can ensure the safe and reliable operation of robotic systems. In this thesis, we propose a novel approach that leverages adversarial reinforcement learning to train robust policies against various failures in robotic manipulation. We start by defining the base skill and train a base agent that can accomplish this task. We propose to employ imitation learning to speed up the learning process for complicated base skills. Then, we determine a possible set of failures. For each failure, we define a risk function that indicates how much the environment is close to a catastrophic event associated with that failure type. These risk functions are manually engineered and normalized. Similar to the base agent, we train an adversary for each failure type with the reward function defined as the risk function for that failure. We call this process the isolated training phase. After the isolated training, we place the protagonist (or multiple protagonists) and the adversaries in a self-play environment. In this environment, agents take turns controlling the robot and try to maximize their respective rewards. The reward of the adversary is set as the penalty for the protagonist and the training process becomes a zero-sum game. After some training, the protagonist becomes more proficient in preventing the failures caused by the adversaries. As the domain, we have chosen the task of stirring a bowl with a spoon using a humanoid robot. We have implemented and tested the proposed method in a simulation environment that contains a bowl and 40 balls inside it. The agents move the spoon in a 3D space to interact with the environment and collect low-dimensional observations and rewards. For training a protagonist, we can either train a compound protagonist that can single-handedly prevent all failure types, or we can train distinct protagonists for each of the failure types. In this thesis, we take both approaches and compare the results. We evaluate the failure prevention performance of the protagonist by measuring the failure prevention success rate in a time window and the total number of steps required to reduce the risk below a certain threshold. The set of distinct protagonists outperforms the compound protagonist slightly. Overall, evaluation results show that adversarial learning is an efficient and successful way to learn prevention policies.
GAN-based intrinsic exploration for sample efficient reinforcement learnin

(Graduate School, 2022-05-23) Kamar, Doğay ; Ünal, Gözde ; 504181511 ; Computer Engineering

Reinforcement learning is a sub-area of artificial intelligence in which the learner learns in a trial-and-error manner. The learner does so by executing an action depending on the current state it is in and observing the results. After executing an action, a reward signal is given to the learner and through the rewards, the learner can learn which actions are best in different situations. However, the learner is not given any prior information about the environment it is in or which action is the best depending on the current state. Therefore, exploring the environment is important for gathering the necessary information in order to navigate to the high rewards. Most common exploration strategies involve random action selection occasionally. However, they only work under some conditions such that the rewards need to be dense and well-defined. These conditions are hard to meet for many real-world problems and an efficient exploration strategy is needed for such problems. Utilizing the Generative Adversarial Networks (GAN), this thesis proposes a novel module for sample efficient exploration, called GAN-based Intrinsic Reward Module (GIRM). The GIRM computes an intrinsic reward for the states and the aim is to compute higher rewards for the novel, unexplored states. The GIRM uses GAN to learn the distribution of the states the learner observes and contains an encoder, which maps a query state to the input space of the generator of the GAN. Using the encoder and the generator, the GIRM can detect if a query state is among the distribution of the observed states. If it is, the state is regarded as a visited state, otherwise, it is a novel state to the learner, in which case the intrinsic reward will be higher. As the learner receives higher rewards for such states, it is incentivized to explore the unknown, leading to sample-efficient exploration. The GIRM is evaluated using two settings: a sparse reward and a no-reward environments. It is shown that the GIRM is indeed capable of exploring compared to the base algorithms, which involve random exploration methods, in both of the settings. Compared to the other studies in the field, the GIRM also manages to explore more efficiently in terms of the number of samples. Finally, we identify a few weaknesses of GIRM: the negative impact on the performance when sudden changes to the distribution of the observed states occur, and the exploitation of very large rewards not being avoided.
Predictive error compensated wavelet neural networks framework for time series prediction

(Graduate School, 2024-07-22) Macit, Serkan ; Üstündağ, Burak Berk ; 504221532 ; Computer Engineering

Machine learning algorithms have gotten considerable attention and recognition in the context of addressing time series prediction problems; however, the task of constructing an accurate machine learning model with optimal architecture and hyperparameters becomes highly challenging if the data is non-linear, encompasses multi variable characteristics with chaotic or stochastic properties, and has sensitivity to environmental factors. In this context, common issues encountered in time series prediction models and frameworks include overfitting, a machine learning problem arising in situations with limited labeled data but a large variety of input data; generalization issues due to insufficient or inadequate input data; the need for extensive feature engineering to properly set internal weights in artificial neural networks; dependency on network parameters in developed solutions and limited adaptability to different problems; and being computationally expensive and time-consuming. Predictive Error Compensated Wavelet Neural Networks (PECNET) is an innovative artificial neural network architecture for time series predictions. It avoids overfitting by training the data separately in cascaded networks based on different frequency bands and using the remaining errors of each subsequent network as target data. In the PECNET architecture, data is fed into the networks with a low-frequency band in a wide time window, and the subsequent network is trained with narrower time windows and high-frequency data while using the error from the previous low-frequency network as the target data. This method improves the orthogonality of data features across time windows and lets you choose orthogonal features in data fusion applications. It also makes predictions more accurate as more networks are added, which lowers the risk of overfitting. Additionally, by applying wavelet transform as a feature extraction method to the various frequency components of each network , it is possible to distinguish and extract the variety of patterns present in the data. PECNET also overcomes the traditional normalization problems for non-stationary time series data by using adaptive normalization techniques. In conclusion, PECNET is a very good alternative for solving time series prediction problems due to its high prediction accuracy without overfitting, unique structure that allows adaptation to different problems independently of network parameters, and being computationally cheap and low time-consuming just with two-layer MLP structure. The PECNET model, due to its composition of cascade networks, modular feature extractions, and fusion networks, presents challenges in its implementation at the coding level. PECNET contains many sequential cascaded neural networks that are trained by each other's errors. When training the sequential networks with input data sequences, it is necessary to shift the input values to the previous time window. Otherwise, it would result in using data from the future, which is not feasible in practice. The continuous use of error data sequences as both label and input data in training and the use of the final error in the fusion network for predicting the remaining error value of the time series as both input and target data increase computational complexity for solutions involving numerous data sources and networks. This can lead to execution errors in time synchronization management. To overcome these challenges, PECNET framework software has been developed within the scope of this thesis, ensuring optimal use of memory and processor resources along with the modular design. In the development of the PECNET framework software, the C compiler and Python interpreter were utilized. NumPy, Pandas, PyWavelets, and Matplotlib libraries were used for data processing tasks. The PyTorch library was chosen for constructing the artificial neural network model due to its extensive modification features and options for interacting with the graphics processing units (GPUs). The design adhered strictly to object-oriented programming principles, and a syntax similar to Keras was used. Additionally, the machine learning application cycle followed the sklearn flow (fit-predict-eval). The PECNET framework is made up of various modules that work together. In the "models" module, the BasicNN class forms the core of the neural network architecture. This class manages key tasks like initializing the network, fitting data, computing loss, and adjusting the model during training. In the "network" module, specialized classes like ErrorNetwork, FinalNetwork, and VariableNetwork handle specific stages of the prediction process. ErrorNetwork focuses on correcting prediction errors; FinalNetwork integrates predictions from previous networks for a final output; and VariableNetwork manages training input data across different frequencies and integrates data fusion mechanism for multivariate data. In the same module, PECNET's functionality is encapsulated in the Pecnet class, which coordinates the workflow among different networks. It manages the data flow between cascaded networks, error compensation, and final prediction generation. The PecnetBuilder class provides a fluent interface to construct the Pecnet object. It sequentially adds various network components, ensuring a streamlined building process for the PECNET model. In the "preprocessing" module, the DataPreprocessor class plays a crucial role in data preparation. It lets users change how data is processed into different frequency bands, sampling periods, and sequence sizes that are appropriate for each of cascaded networks in the model. It also does scaling, adaptive normalization, denormalization, and wavelet transform with the help of other classes in the same module. This makes sure that the input data is in the best possible shape for the prediction pipeline. In the "utils" module, the Utility class facilitates hyperparameter optimization, offers tools for loading datasets and plotting results. Overall, PECNET's code-level functionality revolves around these classes, each contributing to the framework's ability to process and predict time series data efficiently. PECNET has already demonstrated successful outcomes on various datasets as a discrete code, showing promising results against existing machine learning models such as ARIMA, MLP, CNN, LSTM. In this study, as a framework form, it is initially tested on a historical time series dataset of daily adjusted close prices of Apple stocks and then compared with LSTM, which is known for its strong memory and sequence comprehension capabilities. As a result, in terms of the RMSE metric, the LSTM model has had an error of \$2.55, while PECNET has had an error of \$1.24. In terms of the R2 metric, the LSTM model has achieved a value of 0.94, whereas PECNET has reached 0.98. Following that, it is comparatively tested with LSTM for seismic energy estimation on real-time chaotic Electric Field Data (EFD) which is collected within the scope of earthquake prediction research project conducted at Istanbul Technical University (ITU). In this experiment, in terms of the RMSE metric, the LSTM model has showed errors ranging between 300J-400J, while PECNET has showed errors ranging between 130J-150J. In terms of the R2 metric, the LSTM results has fluctuated between 0.2-0.3, while PECNET has achieved values between 0.5-0.6. In both scenarios, PECNET outperforms LSTM, and the developed framework is being integrated into the portal software for real-time earthquake prediction. In conclusion, the developed modular and customizable framework facilitates the use of PECNET, which is highly performant and robust against overfitting, for various types of time series predictions by other developers in real-time machine learning systems without requiring specific coding knowledge of PECNET.

Gözat

Konu "Artificial intelligence" ile LEE- Bilgisayar Mühendisliği-Yüksek Lisans'a göz atma

Sayfa başına sonuç

Sıralama Seçenekleri