Hücresel yapay sinir ağları için iki öğrenme algoritması ve görüntü işleme uygulamaları

thumbnail.default.alt
Tarih
1994
Yazarlar
Karamahmut, Sinan
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Fen Bilimleri Enstitüsü
Özet
Bu tezde, tam kararlı Hücresel Yapay Sinir Ağlan ( HYSA ) için istenen kararlı hal çıkışlarının eğiticili öğretilmesi amacına yönelik olarak Dinamik Algılayıcı öğrenme Algoritması ("Recurrent Perceptron Learning Algorithm : RPLA ") ve Dinamik Geriye- Yayılım öğrenme Algoritması (" Recurrent Backpropagation Learning Algorithm : RBLA ") acunda iki farklı eğiticili öğrenme algoritması geüştirilmiştir. Bu algoritmaları kullanan, HYSA '1ar için şablon öğrenme programı olan SLAT (" Supervised Learning Algorithm Tool ") programı yazılarak, bir kişisel bilgisayar ortamında benzitim düzeni elde edilmiştir. Geliştirilen algoritmalar, HYSA 'nın eğiticili öğrenmesinin bir amaç ölçütünün tasarım kısıtlamaları altında enazını bulma sorununa dönüştürülmesine dayanmaktadır. Enazlanmak istenen amaç ölçütü, istenen kararlı-durum çıkışları ile gerçek çıkışlar arasındaki öklid uzaklığını veren hata işlevidir. Tasarım kısıtlamaları ise HYSA 'nın tam-kararlı olmasını, yani kararlı-durumda tüm yörüngelerinin denge noktalarından birine yerleşmesini sağlayan bağlantı ağırlıklarının simetrik olma koşulu ile karalı-durum çıkışlarının iki-kutuplu ( + 1) olmasını sağlayan çıkıştan öz-geribesleme bağlantı ağırlığının, durumdan öz- geribesleme katsayısından büyük olma koşulundan oluşmaktadır. Geliştirilen algoritmalardan RBLA, hata işlevinin kısıtlamalar altında yerel enaz noktalarından birinin yeterince küçük seçilen öğrenme oranlan durumunda veren, izdüşüm türünden bir eğim-düşme enazlama algoritmasıdır. RBLA, Hopfield ağı için verilmiş olan Geriye- Yayılım Algoritmasının tam-kararlı HYSA için bir uygulamasıdır. Tam-bağlantılı bir ağ olan Hopfield ağında gerekli bağlantı ağırlık katsayılarının sayısı HYSA 'daki yerel ve düzgün bağlantıyı belirleyen şablon katsayılarının sayışma göre çok fazla olduğundan, önerilen RBLA, Hopfield ağı için önerilen algoritmaya göre, bu açıdan daha etkindir. HYSA 'lan için tezde önerilen RBLA 'nın, çıkış işlevinin doymaya yalan bölgelerde çok düşük türevleri olması yüzünden yavaş yakınsama sorunları vardır. Bu sorunlar tezde geliştirilen çeşitli teknikler ile yenilmeye çalışılmıştır. RPLA ise, HYSA 'nın herbir hücresinin kararlı-durumda hücreye giren toplam girişe bağlı olarak çıkışında +1 ya da -1 veren Algılıyıcı ("Perceptron") benzeri bir işlem elemanı olarak çalışması gözlemine dayanarak geliştirilmiştir. RPLA, tam-kararlı HYSA 'nın eğiticili öğrenmesi için geliştirilmiş bilinen en etkin öğrenme algoritmasıdır.
In this thesis, two supervised learning algorithms for obtaining the template coefficients in completely stable Cellular Neural Networks (CNNs) are presented. The Recurrent Backpropagation Learning Algorithm (RBLA) can be viewed as the CNN version of the well-known Recurrent Backpropagation Algorithm originally developed for a Hopfield type completely stable continuous-time dynamical neural network. The second algorithm presented is inspired by the analogy between the input-output relation of the well-known Perceptron and the steady-state behavior of the cells in CNNs. It is hence called as Recurrent Perceptron Learning Algorithm (RPLA) since applied to a dynamical network, CNN. From the advent of the first useful computer ( ENIAC ) in 1946 until the late 1980s, essentially all information processing applications used a single basic approach: programmed computing. Solving a problem using programmed computing involves devising an algorithm and/or a set of rules for solving the problem and then correctly coding these in software and making necessary revisions and improvements. Clearly, programmed computing can be used in only those cases where the processing to be accomplished can be described in terms of a known procedure or a known set of rules. If the required algorithmic procedure and/or set of rules are not known, then they must be developed an undertaking that, in general, has been found to be costly and time consuming. In fact, if the algorithm required is not simple ( which is frequently the case with the most desirable capabilities ), the development process may have to await a flash of insight. Obviously, such an innovation process cannot be accurately planned or controlled. Even when the required algorithm or rule set can be devised, the problem of software development still must be faced. Because current computers operate on a totally logical basis, software must virtually perfect if it is to work. The exhaustive design, testing, and iterative improvement that software development demands makes it a lengthy and expensive process. A new approach to information processing that does not require algorithm or rule development and that often significantly reduces the quantity of software that must be developed has recently become available. This approach, called neurocomputing, allows, for some types of problems ( typically in areas such as sensor processing, image processing, pattern recognition, data analysis, and control ), the development of information processing capabilities for which the algorithms or rulesare not known ( or where they might be known, but where the software to implement them would be too expensive, time consuming, or inconvenient to develop ). For those information processing operations amenable to neurocomputing implementation, the software that must be developed is typically for relatively straightforward operations such as data file input and output, peripheral device interface, preprocessing, and postprocessing. The Computer Aided Software Engineering ( CASE ) tools often used with neurocomputing systems can frequently be utilized to build these routine software modules in a few hours. These properties make neurocomputing an interesting alternative to programmed computing, at least in those areas where it is applicable. Formally, neurocomputing is the technological discipline concerned with parallel, distributed, adaptive information processing systems that develop information environment. The primary information processing structures of interest in neurocomputing are artificial neural networks ( although other classes of adaptive information processing structures are sometimes also considered, such as learning automata, genetic learning systems, data-adaptive content addressable memories, simulated annealing systems, associative memories, and fuzzy learning systems ). Artificial neural systems function as parallel distributed computing networks. Their most basic characteristics is their architecture. Only some of the networks provide instantaneous responses. Other networks need time to respond and are characterized by their time-domain behavior, which we often refer to as dynamics. Neural network also differ from each other in their learning modes. There are a variety of learning rules that establish when and how the connecting weights change. Finally, networks exhibit different speeds and efficiency of learning. As a result, they also differ in their ability to accurately respond to the cues presented at the input. In contrast to conventional computers, which are programmed to perform specific tasks, most neural networks must be taught, or trained. Learning corresponds to parameter changes. Learning rules and algorithms used for experiential training of networks replace the programming required for conventional computation. Neural network users do not specify an algorithm to be executed by each computing node as would programmers of a more traditional machine. Instead, they select what in their view is the best architecture, specify the characteristics of the neurons and initial weights, and choose the training mode for the network. Appropriate inputs are then applied to the networks so that it can aquire knowledge from the environment. As a result of such exposure, the network assimilates the information that can later be recalled by the user. In the past three decades a number of neural network architectures have been developed. The architectures have been inspired both by the principles governing biological neural systems and the well-established theories of engineering and fundamental sciences. Most of the widely applied neural networks fall into two main classes: 1) memoryless neural networks and 2) dynamical neural networks. From a circuit theoretical point of view, the memoryless neural networks are non-linear resistive circuits, while the dynamical neural networks are non-linear R-L-C circuits. A memoryless neural network defines a non-linear transformation from the space of input signals into the space of output signals. Such networks have been successfully used in pattern recognition and several problems which can be defined as a non-linear transformation between two spaces. As in the Hopfield network and Cellular Neural Network, dynamical neural networks have usually been designed as dynamical V1Usystems where the inputs are set of some constant values and each trajectory approaches one of the stable equilibrium points depending upon the initial state. Some useful application of these networks includes image processing, pattern recognition and optimization. Due to the grid topology of Cellular Neural Networks which is tailor made for image processing, this artificial neural network model is considered in this thesis. A Cellular Neural Network is a 2-dimensional array of cells [1], Each cell is made up of a linear resistive summing input unit, an R-C linear dynamical unit, and a 3 -region, symmetrical, piecewise-linear resistive output unit. The cells in a CNN are connected only to the cells in their nearest neighborhood defined by the following metric: d(i,j;i,j) = max{|i-î, j-jlf. Where (i,j) is the vector of integers indexing the cell C(i,j) in the i th row j th column of the 2-dimensional array. The system of equations describing a CNN with the neighborhood size of one is given in (l)-(2). *u=-Axi,j+ ZwP.iyi+P.j+i(°°) + ZzP.iui+P.j+i +I. ü) p.l e {-1.0,1} p,l e {-1,0,1} yy = f(*J =^{KJ+1|-k-1|}- (2) Where, A, I, wp, and zp, e R are constants. It is known in [1] that a CNN is completely stable if the feedback connection weights wpl are symmetric. Throughout the thesis, the input connection weights zpI are chosen symmetric for reducing computational costs while the feedback connection weights w, are chosen symmetric for ensuring the complete stability, i.e., def def def w-,,-, = wu = a,, w_]0 = w1<0 = %, w_ul = wliH = a3, def def wo,-ı= wo,ı =a4> w00 = a5; def def def def def z-ı -1 = zı,ı = "ı ' z-ı,o = ^.o = "2.> z-ı.ı = \~ı = "3 ?> zo,-ı = zo,ı = b4, z00 = b5 IXIn this thesis, the learning is accomplished through modification of the following weight vector weR". def r, def w = [aTbTl] = [a, a2a3a4a5b1b2b3b4b5l] T. (3) Several design methods for determining the entries of w, i.e., the feedback template coefficients a; 's, the input template coefficients b{ 's, and the threshold I are proposed in the literature [1], [3], [4]. The well-known relaxation methods for solving linear inequalities are used in [3]-[4] for finding one of the connection weights providing that the desired outputs can be obtained as the actual outputs for the given external inputs and for the properly chosen initial states. However, the methods in [3]-[4] do not specify which initial state vectors, except for the following trivial one, yield the desired output for the given external input and the weight vector found. The trivial solution in the determination of such a proper initial state vector is to take the desired output as the initial state; but this requires the knowledge of the desired output which is not available for the external inputs outside the training set. For this reason, it is still desired to develop new design methods and supervised learning algorithms for finding the connection weights yielding the desired output for the given external input and the chosen initial state. The supervised learning algorithms in this thesis are proposed for such a purpose. Recently, a number of supervised learning algorithms for CNNs are given in the literature [2], [7]-[9], [11]-[14]. The well-known Backpropagation through Time Algorithm is applied in [7] for learning the desired trajectories in continuous-time CNNs. A modified Alternating Variable Method is used in [8] for learning the steady- state outputs in discrete-time CNNs. Both of these algorithms are proposed to be used for any kind of CNN and hence they do not take into account the constraints which are needed to be imposed on the connection weights for ensuring the complete stability and the bipolarity of the steady-state outputs. It is shown in [9] that the supervised learning of the steady-state outputs in completely stable generalized CNNs [10] is a constrained optimization problem, where the objective function is the error function and the constraints are due to some desired qualitative and quantitative design requirements such as the bipolarity of the steady-state outputs and the complete stability. The algorithm given in [9] is a gradient-descent algorithm and indeed it is an extension of the recurrent backpropagation algorithm to the generalized CNNs. The recurrent backpropagation is applied also in [1 1]-[12] to a modified version of CNNs differing from the original CNN model in the following respects: i) The cells are fully-connected, ii) The output function is a differentiable sigmoidal one, and iii) The network is designed as a globally asymptotically stable network. In a very recent paper [13], the modified versions of the backpropagation through time and the recurrent backpropagation algorithms are used for finding a minimum point of an error measure of the states instead of the output. It is assumed in [13] that the steady-state values of the states have the derivative with respect to the connection weights. However, this assumption is true only if the magnitudes of the states are all strictly greater than one. Therefore, the gradient calculated in [13] doesnot describe the jumping of the steady-state values of the states from one saturation region to the other as a consequence of the changes in the connection weights. The lack of the derivative of the error function prevents the use of the gradient-based methods for finding the templates minimizing the error. In order to overcome this problem, the output function can be changed [14] with a continuously differentiable one which is very close to the original piecewise-linear function. Whereas the gradient methods are now applicable, the error surfaces have almost flat portions resulting in extremely slow convergence [14]. An alternative solution to this problem is to use methods not requiring the derivative of the error function. Such a method is given in [2] by introducing genetic optimization algorithms for supervised learning of the optimal template coefficients. The Recurrent Backpropagation Learning Algorithm (RPLA) given in this thesis is a special case of the one given in [30] and also can be viewed as the CNN version of the Recurrent Backpropagation Algorithm. The algorithm finds the optimal template coefficients and the threshold minimizing the total error function while satisfying the constraints i) w0 0 = a5 > A, and ii) the symmetry conditions for the feedback template coefficients ; where the first constraint ensures the bipolarity of the steady-state outputs and the second the completely stability in CNNs. The Recurrent Perceptron Learning Algorithm (RPLA) proposed in this thesis does not need the derivative of the error function. It is developed for finding the template coefficients of a CNN to realize an input-(steady-state)output map described by a set of training samples. Here, the input consists of two parts: The first part is the external input and the second is the initial state. The algorithm resembles the well- known Perceptron learning algorithm [16] and hence called as Recurrent Perceptron Learning Algorithm (RPLA) for CNNs. RPLA starts with an initial weight vector satisfying the constraint w0 0 = a5 > A ensuring the bipolarity of the steady-state output. The updated weight vector obtained in each step of the algorithm is projected r ^ i onto the constraint set A= { w gR11 = |a5 > A j. It is shown in the thesis that if there is a weight vector which satisfies the bipolarity constraint and yields the zero Output Mismatching Error defined in (4), then RPLA finds this vector with a suitable time- varying learning-rate. e(w) = Z *»(*»-j = l} and D" ={(i,j) | yjoo) = -dy =-l} It is assumed here that the state vector is never chosen equal to one of the equilibrium points in the center region or partial regions in the state space and therefore any steady-state output is either +1 or -1. XIThe proposed algorithms RBLA and RPLA have been applied to several image processing problem. It has been observed that CNNs using the rule defined by RBLA and RPLA can learn edge detection, corner detection, and hole filling tasks very quickly. The structure of the thesis is as follows. Chapter 2 gives an introduction to artificial neural networks. The general structure of CNNs is presented in Chapter 3. In order to put the work which is carried out by mis thesis into a more understandable way a general information on learning and different learning algorithms for artificial neural networks are discussed in Chapter 4 and 5, respectively. In Chapter 6, the Recurrent Backpropagation Learning Algorithm (RBLA) and the Recurrent Perceptron Learning Algorithm (RPLA) are proposed. Chapter 7 is a guide for the simulation tool SLAT ( Supervised Learning Algorithm Tool ) which uses RBLA and RPLA. The simulation results of SLAT are presented in Chapter 8.
Açıklama
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 1994
Anahtar kelimeler
Algoritmalar, Görüntü işleme, Yapay sinir ağları, Öğrenme, Algorithms, Image processing, Artificial neural networks, Learning
Alıntı