Kod uyarımlı doğrusal öngörü yöntemi ve stokastik kod defteri arama işlemi için hızlı bir yöntem

dc.contributor.advisor Panayırcı, Erdal
dc.contributor.author Erdoğan, H. Zeki
dc.contributor.authorID 39481
dc.contributor.department Telekomünikasyon Mühendisliği
dc.date.accessioned 2023-02-24T08:14:43Z
dc.date.available 2023-02-24T08:14:43Z
dc.date.issued 1994
dc.description Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 1994
dc.description.abstract in this thesis it has been studied on Code Excited Linear Prediction (CELP) speech coding algorithm.. it has been developed a novel stochastic code book search algorithm based on vector quantization and clustering techniques which eliminates most of the complexity of algorithm. CELP is an efficient speech coding algorithm based on a model of human speech production mechanism. Human speech production mechanism consist of three main part which are lungs, larynx and vocal tract. Lung is the source of air pressure developed through larynx, which controls the speed of air velocity using diaphragm that is located bottom of it. Larynx is body of interlocking cartilage and air leaving the lungs passes through it During normal breathing, the vocal folds are abducted, allowing air to pass freely through the gap called glottis- between the two vocal folds. During voiced speech the vocal folds are repeatedly brought together and forced apart, causing an oscilation. Pitch is a tonal sensation as perceived by a human hstener. Vocal folds oscillation period is called fundamental period often used instead of pitch period. Unvoiced sounds are produced when the vocal folds are sufficiently abducted to allow air to pass relatively unimpeded through glottis. The sound pressure wave above the larynx is modifîed by vocal tract in two ways. Vocal tract modifîes the spectral distribution of the energy of the sound wave ör generates sound. Voiced sounds are produced at the larynx. Unvoiced sounds are normally produced above the larynx, at some point of constriction \vithin the vocal tract. Vocal tract shapes the spectrum of speech signal according to place of lips, tongue and nasal tract. Thus, vocal and nasal tracts are called articulation fîlters. Speech sound can be classifîed into three distinct classes according to their mode of excitation. Voiced sounds are produced by forcing air through the glottis with the tension of the vocal cords adjusted so that they vibrate in a relaxation oscillation, thereby producing quasi-periodic pulses of air which excite the vocal tract Fricative ör unvoiced sounds are generated by forming a constriction at some point in the vocal tract, and forcing air through the constriction at a high enough velocity to produce turbulence. This creates a broad-spectrum noise source to excite the vocal tract. Plosive sounds result from making a complete closure, buildings up pressure behind the closure, and abruptly releasing it. Speech signals have the time varying character according to the movements of the organs in the vn vocal tract and laryırx. Speech signals can be classified as quasi-stationary in the unvoiced sound and quasi-periodic in voiced sound in a short period of time (typically 10-50 ms). To model speech production mechanism, vocal tract transfer function, vocal tract excitation air pressure signal spectrum and speech signal radiation from lips and nostrils spectrum effect are added to build articulation filter transfer function of the systems, which shapes frequency spectrum envelopes of the speech signal. Thus, excitation signal generator of articulation filter is flat spectrum noise source ör flat spectrum periodic pulse train according to unvoiced ör voiced respectively. it can be shown that the articulation filter and the excitation generator are independent from each other and öne can constract a speech production model with Unear relations of these two blocks. This model called linear independent human speech production model and most of the modern speech coding techniques are using this approach. CELP is a frame oriented technique that breaks a sampled input signals into blocks of samples that are processed as independent units. Number of speech samples in a block is an important parameter and must satisfy quasi-stationary conditions: The algorithm uses frames of 240 samples of speech (30 ms) for linear prediction analysis and subframes of 60 samples of speech (7.5 ms) for analysis excitation signal analysis. Speech synthesis is applied on subframe basis. CELP coding is based on analysis-by-synthesis search procedure, perceptually weighted vector quantization, and linear prediction. A İOth order linear prediction filter is used to model the short term spectrum ör formant structure of the speech signal. For analysis of linear prediction filter, autocorrelation method is used which gives guarantee for stability of filters. Analysis of linear prediction filters applied on frames of 240 samples with hamming window. 15 Hz bandwidth expansion applied to linear prediction filter to preserve spectral features of speech signal according to human hearing. hı speech synthesis, linear prediction filter coefficients for subframe of 60 samples interpolated from filter coefficients of 240 samples. in transmission, linear prediction filter coefficients are converted and coded as üne spectrum pairs for an efficent quantization of the filter coefficients. Code Excited Linear Prediction coding algorithm improves the quahty of the synthesized speech, by choosing the excitation signal of the Hnear prediction filter as sum of two signals selected from two code books, öne adaptive and the other stochastic, in such a way as to minimize the error. Stochastic code book vm values of it. During the clustering iterations, each centroid is replaced by the nearest code word, and the iteration is ended when clusters do not change any more. Thus, the table of the tree structure consists of code word indices only. This way of tree construction is incorporated into 4800 Baud CELP algorithm of the Federal Standard 1016, in two terminal cell sizes. For tenninal cell sizes of 32 and 8, number of tests required are reduced to 32 and 21 from 512, and the computational complexity to 2.27 and 3.55 MIPS from 8.33 MIPS, respectively(see table). It is observed that, for terminal cell sizes of 32 and 8, average signal to noise ratio becomes to 7.3 1 and 7.23 dB instead of 8.33 dB, a loss of about 1.0 and 1.1 dB, respectively. Probability of finding the best n-th code words, from the best code word(n=l) to 60th code word(n=60) are shown in figure (see şekil 8.1), in two cases. Important aspect of this algorithm is that, this way of stochastic code book search is in compliance with the Federal Standard 1016. XI Table I. CELP Computational Complexity. For this reason, decreasing the computational complexity of stochastic code book search is beneficial to implement the CELP algorithm with low-cost DSP chips without reducing stochastic code book size. In this work, stochastic code book is reordered into a hierarchical binary- tree structure using neighborhood (distance) relationship between code words. For binary clustering on the tree of code words of 60 sample vectors, corelation based distance has been defined between code words. Equation below calculates the distance measure between x andy code words. Low pass filtering has been applied to the stochastic code book to increase the distance resolution, before determining distance between codewords d(x,y) = l-(x,y) Using K-means clustering techniques code words are divided into two regions iteratively. The distance measure stated above is used and starting from two orthogonal initial centroids, in each iteration, first the codewords are divided into two regions and the new centroids are calculated, and these iterations are continued until the centroids reach to stable positions. For the distance measuere used, the centroids of a region is the eigenvector corresponding to the maximum eigenvalue of the covariance matrix of the code words in that region. In this work, to stabilize the centroids and reduce the size of the binary search table, in each iteration, centroids are selected as the code words nearest to the calculated centroids.. In this way, stochastic code book search is converted into a 6-7 level binary-tree search which uses a table containing code word indices ordered according to the tree structure. In construction of the binary-tree, code book is first divided into two regions using K-Means clustering technique, then, in the same way, each region is divided further into two regions, and so on, until terminal cells have at most a constant number of code words. The maximum size of the terminal cells is the parameter of the tree structure, and the algorithm is tested for different X values of it. During the clustering iterations, each centroid is replaced by the nearest code word, and the iteration is ended when clusters do not change any more. Thus, the table of the tree structure consists of code word indices only. This way of tree construction is incorporated into 4800 Baud CELP algorithm of the Federal Standard 1016, in two terminal cell sizes. For tenninal cell sizes of 32 and 8, number of tests required are reduced to 32 and 21 from 512, and the computational complexity to 2.27 and 3.55 MIPS from 8.33 MIPS, respectively(see table). It is observed that, for terminal cell sizes of 32 and 8, average signal to noise ratio becomes to 7.3 1 and 7.23 dB instead of 8.33 dB, a loss of about 1.0 and 1.1 dB, respectively. Probability of finding the best n-th code words, from the best code word(n=l) to 60th code word(n=60) are shown in figure (see şekil 8.1), in two cases. Important aspect of this algorithm is that, this way of stochastic code book search is in compliance with the Federal Standard 1016. en_US
dc.description.degree Yüksek Lisans
dc.identifier.uri http://hdl.handle.net/11527/21751
dc.language.iso tr
dc.publisher Fen Bilimleri Enstitüsü
dc.rights Kurumsal arşive yüklenen tüm eserler telif hakkı ile korunmaktadır. Bunlar, bu kaynak üzerinden herhangi bir amaçla görüntülenebilir, ancak yazılı izin alınmadan herhangi bir biçimde yeniden oluşturulması veya dağıtılması yasaklanmıştır. tr_TR
dc.rights All works uploaded to the institutional repository are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. en_US
dc.subject Doğrusal öngörü analizi tr_TR
dc.subject Sayısal işaret işleme tr_TR
dc.subject Ses işareti tr_TR
dc.subject Linear prediction analysis en_US
dc.subject Digital signal processing en_US
dc.subject Voice signal en_US
dc.title Kod uyarımlı doğrusal öngörü yöntemi ve stokastik kod defteri arama işlemi için hızlı bir yöntem tr_TR
dc.type masterThesis en_US
Dosyalar
Orijinal seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.alt
Ad:
39481.pdf
Boyut:
7.01 MB
Format:
Adobe Portable Document Format
Açıklama
Lisanslı seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.placeholder
Ad:
license.txt
Boyut:
3.16 KB
Format:
Plain Text
Açıklama