Türkçede ayrık konuşma tanıma

dc.contributor.advisor Adali, Eşref
dc.contributor.author Ölçer, Ercan
dc.contributor.authorID 39435
dc.contributor.department Kontrol ve Otomasyon Mühendisliği tr_TR
dc.date.accessioned 2023-03-16T05:59:34Z
dc.date.available 2023-03-16T05:59:34Z
dc.date.issued 1993
dc.description Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 1993 tr_TR
dc.description.abstract Bu çalışma ile türkçe seslerin analizi yapılmıştır. Böylece konuşma analizi için bir temel oluşturulmaya çalışılmıştır. Bu sistemle herhangi başka bir sisteme doğrudan sesle komut vermek mümkündür. Dolayısı ile çoğu kontrol sistemlerinde kullanılacak bir özelliğe sahiptir. Bunun için doğrusal öngörü analiz yöntemi kullanılmıştır. Böylece sesin tüm örneklenen değerlerini depolayarak saklamak yerine ses bilgisini taşıyacak olan parametreler kullanılmıştır. Bu parametreler daha sonra aynı ses komutu geldiğinde karşılaştırma için saklanacaktır. Bu şekilde aynı komuta ait ses işaretlerinin parametreleri karşılaştırılarak bir yargıya varılacak ve ses tanınacaktır. Sesli seslerin tanınması yanında sessiz seslerin tanınmasının da sağlanabilmesi için algoritma geliştirilmiştir. Sesin analizinde LPC analizinden elde edilecek parametrelerin bulunması ve bunların nicemlendirilmesi, sese ait temel frekansların bulunması ve bunların değerlendirilmesi ana konularımız olacaktır. Sesin tanınması için bu iki parametrenin bir dizi mantık kuralından geçirilerek değerlendirilmesi yapılmıştır. Bu analizleri yapabilmek amacıyla çeşitli bölümlerde ilgili algoritma ve yöntemler tartışılacak ve yorumlanacaktır. Son bölümde ise tezin asıl gerçekleştirme aşamasında yazılan program tanıtılacaktır. Ayrıca yöntem ile ilgili önerilerin ve sonuçların değerlendirildiği bölümde sistemle ilgili çeşitli tavsiyelerde bulunulacatır. tr_TR
dc.description.abstract Analysis Of Speech Signals For Isolated Speech Recognition System Speech recognition can be divided into a large number of subareas depending on such factors as the vocabulary size, speaker population speaking conditions, etc. The basic task of a speech recognition system is either to recognize the entire spoken utterance exactly, (e.g., phonetic or orthographic speech to text typewriter system), or else to understand the spoken utterance (i.e., to respond in a correct manner to what was spoken). The concept of understanding rather than recognizing the utterance is of most importance for systems which deal with fairly large vocabulary continuous speech input, whereas the concept of exact recognition is of most importance for limited vocabulary, small speaker population, isolates word systems. We will spend some time discussing the various alternatives in speech recognition systems in next chapters. For speech recognition, digital speech processing techniques are applied to obtain a pattern which is then compared to stored referance patterns. In speech recognition, the goal is to determine what word, phrase, or sentence was spoken. The area of speech recognition is one in which a large number of options must be specified before the problem can even be approached. Examples of these options are the following: l.Type of speech 2. Number of speakers 3. Type of speakers 4. Speaking environment 5. Transmission system 6. Type and amount of system training 7.Vocobulary size 8. Spoken input format VI It can be seen from the above list that wide variety of options are available in the specification of a speech recognition system. In our system, voice recognition in turkish is based on. It can be implemented on the following steps: Speech segmentation Parameterization Vectorization Coding Speech formants finding Identification Recognition Speech segmentation iş to be part of input. So, voiced, unvoiced and silence patitions are pointed out. In another stage, voiced patition is analyzed by LPC (linear predictive coding) and is found three formants of it. In parameterization, voiced parts of speech are anaysed by LPC. Then, the LPC coefficients are found and are sequenced in every windows. Vectorization is a method that density point of LPC parameters is found for speech. After that, a codebook is found out. This is a represantive code for input. By this, if same voice is sampled, code of new sampling should be same code. In Speech formants finding, three basic frequency of voice are found in every window of sampling. All three formants are counted and voice whichever number is big one, it is chosen first formant. Then this process is repeated and second and third formants are calculated. In identification process, codebook and formants values are saved, so this file is the definition of voice. In speech recognition, there are some process to recognize some voices. These are training and recognition steps. VII In training step, unknown voice is repeated again and again and analyzed by the system. So, parameters is found and converged to optimum values of voice. These are saved and used in recognition if same voice is repeated. In recognition step, saved voices (referance voices) are red from disk and compared them with testing voice. If distance of referance voice is smaller than other distances, the smallest one is chosen as recognized voice. All of these processes are done by some analys techniques. One of the most powerful speech analysis technique is the method of linear predictive analysis. Linear prediction has bean widely used to describe a new approach to speech analysis and synthesis. In this study, based on the basic principles of the Linear Prediction Analysis. It is tried to analysis the speech signals. Therefore the main problem is to solve o find the predictor coefficients of the system. The basic idea behind the linear predictive analysis is that a speech sample can be approximated as a linear combination of past speech samples. The basic problem of the linear prediction analysis is to determine a set of predictor coefficient (ak ) directly from the speech signal in such a manner as to obtain a good estimate of the speech. Because of the time-varying nature of the speech signal the predictor coefficients must be estimated from short segment of the speech signal. This approach leads to a set of linear equations that can be efficiently solved to obtain the predictor parameters. The speech signal can be modeled as the output of a time-varying linear system excited by either random noise (for unvoiced speech) or a quasi-periodic sequence of impulses (for voiced speech). The parameters of this model are voiced/unvoiced classification, pitch period VIII for voiced speech, gain and the coefficients of the digital filter. As applied to speech processing, the term linear prediction refers to a variety of essentially equivalent formulations of the problem of modeling the speech waveform. The differences among these formulations concern the details of the computations used to obtain the predictor coefficients. Autocorrelation and covariance methods help to solve the prediction coefficients of the speech model. In order to solve these coefficients, another technique can be used. But, at last, the main is to solve a system of p-linear equations in p unknowns. To obtain predictor coefficients, there are two methods are 'cholesky decomposition solution for the covariance method and Levinson-Durbin' s recursive solution for the auto correlation method. The amplitude of the speech signal varies appreciably with time. In particular, the amplitude of the unvoiced segments is generally much lower than the amplitude of the voiced segments The short-time of the speech signal provides a convenient representation that reflects these amplitude variations. One difficulty with the short time energy function is that it is very sensitive to large signal levels. A simple way of alleviating this problem is to define an average magnitude function where the sum of values of the signal is computed instead of the sum of squares. Here N is the window length. The major significance of E or M is that it provides a basis for distinguishing voiced speech segments from unvoiced speech segments. Energy can be used to distinguish speech from silence, too. So, by the way to do this, the silence parts of the signal are not insert in computing. Formants can be estimated from the linear prediction parameters. Given the LPC coefficients a/J =1,2,.... comput ing IX Ak=DFT(\,aua2, a",0,0,0....) to obtain the discrete fourier transform of the inverse filter, simple minimum picking on \Ak\ for each frame gives the raw data from which formant frequencies can be estimated. Analysis parameters are coded by using vector quantization algorithm for the segment recognition process. In vector quantization, new reconstruction levels are determined and from here, a codebook is produced. A N-level k-dimensional quantizer is a mapping, q that assigns to each input vector, x = (xo,....Xk-i) a reproduction vector, x/ -q(x), drawn from a finite reproduction alphabet, A/ = (y,,i= 1,2,....A). The quantizer q is completely described by the reproduction alphabet (or codeboook) A' together with the partition, S =(S,;i= 1,2,....N) of the input vector space into the sets Sj = (x:q(x)=yi) of input vectors mapping into the i. reproduction vector (or codeword). Such quantizers are also called block quantizers, vector quantizers and block source codes. In the last section, it is tried to analysis the speech signals. For this reason a computer program in C is written. A voice card called 'SOUND BLASTER PRO' is used to take samples. Taken samples is analyzed in short-time interval. To find the predictor coefficients the correlation values are computed. Then Levinson-Durbin algorithm is used to find predictor coefficients for all segments. After estimation of the filter coefficients, voiced/unvoiced classification, formants, knowledge about speech is saved. After than, by a new analysis of new speech, another series of knowledge about on this speech are found. Then, saving knowledge and new knowledge are compared by using distance measure D(x,x,)=\x-x/\2 for all windows in desired bounds. If this comparison gives us good information, we can say, new speech is same as saving one. So, new speech is recognized. en_US
dc.description.degree Yüksek Lisans tr_TR
dc.identifier.uri http://hdl.handle.net/11527/23519
dc.language.iso tr
dc.publisher Fen Bilimleri Enstitüsü tr_TR
dc.rights Kurumsal arşive yüklenen tüm eserler telif hakkı ile korunmaktadır. Bunlar, bu kaynak üzerinden herhangi bir amaçla görüntülenebilir, ancak yazılı izin alınmadan herhangi bir biçimde yeniden oluşturulması veya dağıtılması yasaklanmıştır. tr_TR
dc.rights All works uploaded to the institutional repository are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. en_US
dc.subject Bilgisayar ve Kontrol tr_TR
dc.subject Ayrık sözcük tanıma tr_TR
dc.subject Türkçe tr_TR
dc.subject Computer Science and Control en_US
dc.subject Isolated word recognition en_US
dc.subject Turkish en_US
dc.title Türkçede ayrık konuşma tanıma tr_TR
dc.type Master Thesis tr_TR
Dosyalar
Orijinal seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.alt
Ad:
39435.pdf
Boyut:
2.72 MB
Format:
Adobe Portable Document Format
Açıklama
Lisanslı seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.placeholder
Ad:
license.txt
Boyut:
3.16 KB
Format:
Plain Text
Açıklama