LEE Elektronik MühendisliğiDoktora
Bu koleksiyon için kalıcı URI
Gözat
Son Başvurular
1  5 / 13

ÖgePerformance assessment of nonlinear active devices to design broadband microwave power amplifiers via virtual gain optimization(Graduate School, 20230426)In this thesis, we proposed a structured set of sequential procedures to design broadband microwave power amplifiers. A power amplifier is a major building block in transceivers for wireless communications. The output stage of a transmitter, amplifies the modulated electrical signal over a power amplifier connected to an antenna. To design solidstate microwave power amplifiers, active devices such as radio frequency (RF) power transistors are used. Nowadays, it is a common practice to employ Gallium Nitrate (GaN) transistors in RF power amplifier designs due to their high electron mobility and highpower delivering capacity. In practice, power amplifier design process starts with careful selection of the power transistor considering the design parameters such as the required output signal power to be delivered, drain/poweradded efficiency of the amplifier, transducer power gain over the specified bandwidth, etc. Once the power transistor is selected, its nonlinear behavior is characterized by determining the optimum sourcepull (SP) and loadpull (LP) impedances to design an RF power amplifier for optimum gain and efficiency. As they are obtained, these impedances may not be realizable network functions over the desired frequency band to yield the input and the output matching networks for the amplifier. Therefore, in this thesis, first, we introduce a new method to test if a given impedance is realizable. Then, a novel "Real Frequency Line Segment Technique" based numerical procedure is introduced to assess the gainbandwidth limitations of the given source and load impedances, which in turn results in the ultimate RF powerintake and powerdelivery capacity of the amplifier. During the numerical performance assessment process, a robust tool called "Virtual Gain Optimization" is presented. In the course of performance assessment process, a new definition called "PowerPerformanceProduct" is introduced to measure the quality of an active device. Examples are presented to test the realizability of the given source/loadpull data and to assess the gainbandwidth limitations of the given source/loadpull impedances for a 45WGaN power transistor, over 0.83.8 GHz bandwidth. In the second part of the thesis, we present the actual design and implementation of the novel methods in a sequential manner. As the result of the proposed methods, firstly, the power intake and powerdelivery capacity of the active device are assessed for a 10WGaN power transistor over 800 MHz3.0 GHz bandwidth. We determined the optimum realizable source and load impedance data via the virtual gain optimization. Then, the optimum source and load impedance data is modelled as realizable network functions. Generated realizable network functions are synthesized using the Darlington synthesis which in turn yields optimum input and output matching network topologies with component values. Eventually, the designed power amplifier is manufactured. It is shown that the computed and the measured performance of the amplifier agrees within acceptable limits. Hence, we obtained an avaragre of 10 Watts output power with 11.4±0.6 dB gain and 49% to 76 % power added efficiency. In the third part, we introduce a new matching concept, socalled Virtual or interchangebly Fictitious Matching (FM), which may be defined between the artificially generated nonFoster passive immittances, over a lossless equalizer [E]. These immittances may not necessarily belong to physical devices, rather, they are fabricated like a sourcepull or loadpull impedances to maximize the gain, the output power, the efficiency, and to minimize the output harmonics of the nonlinearactive device under consideration. In Fictitious Matching problems, equilezer [E] is constructed between the virtually produced generator immittance data K_GF and the load immittance data K_LF to optimize the power transfer in the passband. In this regard, [E] is described by means of its backend driving point input immittance in the Darlington sense, and it is determined as the outcome of the optimization process. The input and the output impedances are synthesized as commensurate transmission lines as they are cascaded. It is demonstrated that the new concept of virtual matching can be utilized to build broadband power amplifiers. In this part, solving the virtual matching problem successively, the input and the output matching networks of a power amplifier are designed over 500 MHz3 GHz with an average gain of 11.5dB, an output power of 40.5 dBm, and an average drain efficiency of 61.7%.

ÖgeEnergyefficient hardware design of artificial neural networks for mobile platforms(Graduate School, 20230309)Deep Neural Networks (DNNs), which have recently improved in accuracy and usefulness, are becoming more and more common in autonomous systems and diagnostic tools. These enhancements cost money, though. DNNs' exponential increase in energy consumption necessitates the development of novel methods for enhancing their energy effectiveness. Modern approaches to energy optimization combine the traditional computing paradigm with a variety of performance enhancement strategies. Memory partitioning, spatial mapping, energyefficient multiplication, weight and input precision optimization, bitserial computation, and MACbased processing element management are a few of these methods. Although these strategies help with the energy crisis to some level, their complexity of use negates any benefits. An energyefficiency solution can be using unary number system which simplifies arithmetic operations of the hardware processor such as multiplication and addition. However this representation has certain drawbacks for the hardware processor such as having shortage of rich random sources and latency problem. A realtime stochastic signal generator called STAMP is built to overcome the issues. STAMP has features of low hardware cost and generates high quality of random stochastic bit streams at high speeds in unary format. A new hybrid bit serialparallel most significant bit (MSBfirst) number representation is proposed, which is different from traditional techniques. Finding a number system that enables each parallel or serial line of the number, designated by m and n, to be handled separately or independently is the driving force behind the new number representation. The hardware space won't change with n and will only depend on m if the serial lines can run independently. If they do, the same hardware can be used repeatedly for each serial line. For use in DNNs, a brandnew hybrid processor dubbed TALIPOT is being proposed. When the desired accuracy is achieved, TALIPOT optimizes operational accuracy/energy point by chopping out bits at the output. Simulations using the MNIST and CIFAR10 datasets show that TALIPOT outperforms the stateoftheart computation techniques in terms of energy consumption. After developing TALIPOT, a computer aided design tool called TAHA is built to employ TALIPOT easily and efficiently on DNNs. TAHA presents an interface and complete guide for the users from training, testing and optimizing DNN hardware until prototyping it into SoC efficiently. Utilizing the algorithm/hardware cooperation and integrating TALIPOT hybrid processor, TAHA can readily offer a number of optimized DNN hardware deployment solutions for the user to select the optimal hardware configuration which maximizes the energy saving under accuracy constraint.

ÖgeRefocusing of moving targets in spotlight sar raw data(Graduate School, 20230517)Synthetic Aperture Radar (SAR) requires specialized image formation Algorithms, which has capabilities like detection of moving targets, and highly focused vehicle identification. However, when it comes to large image scenes, the computational cost increases significantly. Basically, there are two approaches to solve large image formation issue. First technique is to use the exact algorithms like Match filter and Backprojection Algorithm (BPA). In this technique the calculation is done individually in every pixel and every pulse which requires a high performance of computing. The advantage of this technique is the new recorded samples can be integrated easily and there is also no phase error. Second technique uses the fast image algorithms like Polar Format Algorithm (PFA). But this technique introduces geometric dislocation of the target and the defocusing target artifacts when they are located away from the scene center. In literature two approaches is mainly used to minimize the effect of these image artifacts: In the first approach the large scene is divided into small patches and each patch (subimage) is processed relatively to patch scene center instead of using the whole scene center. At the second approach instead of dividing smaller image subpatches, the phase corrections are applied after the image formation. A new targetrefocusing technique basedon recentering phase computation of previously recorded moving target raw data is implemented to the Spotlight SAR data in order to obtain refocused moving targets. The technique is tested on the integrated simulated data; background real spotlight SAR Raw data with the synthetically generated data domes of civilian moving targets. First Polar Format Algorithm is applied to detect and estimate the speed of groundmoving targets on the integrated raw data. At the next step, reorganize the integrated raw data by selecting and arranging target focusing center with a new technique based on recentering phase computation to each moving target speed. At the third step reorganize the raw data by recentering the phase computation to each moving target location. Finally, Polar Format Algorithm is applied to each reorganized raw data to obtain highly focused moving targets individually. To evaluate the performance of the proposed algorithm two parameters are considered. The overall image quality of the focused target and the blurrness metric of the focused target. The mean square error (MSE) and peak signaltonoise (PSNR) ratio is used to evaluate the image quality and variation of the Laplacian is used to compute the blurriness metric. Proposed algorithm result on the moving targets is compared with the conventional PFA result on the stationary targets. By using the %100 accuracy on the target velocity estimation the proposed algorithm gives smaller MSE values for moving targets comparing even with the stationary target results using conventional PFA. Highest variation of the Laplacian values is also achieved by using the proposed algorithm, meaning defocusing artifacts are minimized comparing with even the stationary target conventional PFA result. Proposed algorithm performance is also tested by using different target velocity estimation accuracy. For this purpose 95%, 90% and 85% velocity estimation accuracies are used. The results show that algorithm performance is still better in sense of image quality (lower MSE and higher PSNR) even using the lower estimation accuracies. However velocity estimation error causing smeaiing which leads the final target image blurry. But it is also shown that by using the 95% velocity estimation accuracy variance of Laplacian value of the focused target is close to the stationary target result where is located away from the scene center. For smaller targets (less scatterer from the scenario) the effects will be less visible but the proposed algorithm will still give the better performance for both visually and for quality evaluation parameters perspective by using the higher velocity estimation accuracy results.

ÖgeStochastic bitstreambased vision and learning machines(Graduate School, 20221021)Stochastic computing (SC), a paradigm that dates back to the 1960s, has reemerged in the last decade by being a tool for emerging technology development. SC adopts unconventional data representation encoding scalars (X) into binary streams. Unlike conventional binary radix, the cumulative values of the logic1s and logic0s in the bitstream with a probability Px are evaluated free from the bit position. Thus, SC provides simple circuits of complex functions (e.g., multiplication with a single AND gate) and soft errortolerant (e.g., robust to bitflips) systems. However, latency inevitably occurs because SC expresses bitstreams in long sequences size of N (512 bits, 1024 bits, etc.) for high accuracy. Although several solutions for latency in SC hardware systems have been described in the literature, the software simulation of the SC framework falls behind. Therefore, this doctoral thesis proposes the general framework of softwarebased SC simulations considering both latency and memory issues. This study also discusses the systematic view of SCbased image processing and proposes a new concept, namely the bitstream processing binarized neural network. The dissertation begins with an introduction presenting a short literature check, the purpose of the thesis, and the hypothesis with the major and minor contributions. Then, the background part presents basic SC concepts such as bitstream structure, scalar encoding techniques, correlation, random bitstream generation, SC building block elements, and arithmetic. A cascaded multiplexer (MUX) optimization algorithm is proposed for scaled additions of multiple operands. Also, a comprehensive survey on vision and learning machines is presented, examining previous efforts and exploring the dissertations in the last decade. The softwaredriven SC is further discussed by proposing the utilization of a contingency table (CT). The generation and processing of the lengthy SC bitstreams pose the simulation runtime and memory occupation problems. Considering that applications require intensive arithmetic operations such as artificial neural network (ANN), the problems become significant. To tackle these, scalaronly processing of CT is proposed. CT is set by two input scalars (X1, X2), bitstream crosscorrelation, and bitstream length (N). The main objective is to reach the desired logic output using only the scalar values instead of generating bitstreams and processing bitbybit with logical operators. The CT holds the cumulative values of four logic pairs, 11, 10, 01, and 00, for any overlapping bits of the two bitstream operands. These cumulative values denoted as a, b, c, and d, respectively, are the CT primitives. The correlation value of the two nongenerated bitstream operands sets the prior CT primitive, a, based on the stochastic crosscorrelation (SCC) metric. CT is established for maximum (SCC = 1, a is maximum), minimum (SCC = 1, a is minimum), or nearzero (SCC ≈ 0, a is based on the proposed algorithm) correlation. Zero correlation is vital for the accuracy of some SCbased arithmetic operations (e.g., multiplying bipolar encoded bitstreams by XNOR). Therefore, three methods are proposed to set the 'a' prior primitive for nearzero correlation with an algorithm. After the determination of a, the proposed formulas define b, c, and d. The linear combination of CT primitives obtains each logical operator. (e.g., XNOR is a + d.) The CT emulates the entire hardware system in software via the proposed model of random number generators by including SC's builtin random fluctuation error. The random source models imitate Sobol lowdiscrepancy sequences, linearfeedback shift register (LFSR), and the binomial distribution. Also, CT can simulate all 2^(2N) Cartesian combinations of two input bitstreams; therefore, there is no need for random sampling like in the Monte Carlo simulation. Next, several image processing techniques on behalf of SCbased vision machines are discussed. The first is the reinterpretation of SCbased mean filtering for noise removal. The second is the Prewitt edge detector, a case study for inspecting the different levels of hardware approximations based on the MUX scaling factor. The plain design (PD) exhibits remarkable edge detection performance in the case of excessive noise. The third technique is template matching to detect finder patterns of quick response (QR) codes in a noisy environment. Pattern matching is accomplished by feeding a single AND gate with bitstreams, and SC achieves slight outperformance compared to the deterministic counterpart. As the first study of its kind, to the best of our knowledge, the two other algorithms, bilinear interpolation and image compositing, are synthesized with SC. Bilinear interpolation is a method of scaling the dimensions of images. It is proven that the hardware equivalent can be a simple 4to1 MUX fed by bitstreams. The last technique, image compositing, outputs a new composited image by combining the background and foreground images. It is proposed to obtain the composited image utilizing a simple 2to1 MUX via SC. Both techniques are verified with the help of CT. Bilinear interpolation and image compositing highlight another contribution in validating different random number generator models (Sobol, LFSR, and binomial distribution) on behalf of CT. Finally, this dissertation focuses on the adaptation of SC to learning systems. First of all, a single stochastic neuron is designed, and higher classification accuracy is achieved in early epochs when the neuron is trained with noisy data. Then, the mathematics behind the learning procedure of conventional multilayer neural network architecture is reviewed. A fullybitstream processing binarized neural network (BSBNN) is proposed in comparison with the traditional binarized neural network (BNN) architecture. BNNs express network weights and neuron activations with one bit; however, this causes a fragile structure against soft errors such as bitflips occurring on emerging hardware and memory technologies. In traditional BNN, the neuron activation is +1 or 1, decided by subtraction from a threshold value (+1 is logic1 and 1 is logic0 in hardware). In our proposal, the power of bipolar encoding is used, and the neuron output is decided by checking the majorityminority balance of logic1s or logic0s in the preactivation (S) bitstream. Since this control is performed simultaneously during accumulation (counter) via a masking logic, an additional activation module is not required. Thus, less hardware resource utilization is achieved (30% per neuron basis). In addition, a more efficient architecture against bitflip errors is provided. The proposed architecture proves superior robustness over the conventional fragile BNN regarding image and weightbased bitflips. All four different networks are then tested: BNN, BSBNN, stochastic computingbased neural network (SCNN), and fullprecision neural network (FPNN), having no quantization. Considering imagebased corruptions (contrast, Gaussian blur, fog, speckle noise, zoom blur, etc.), different training scenarios are compared with and without the awareness of corruption in network training. The proposed BSBNN architecture exhibits comparable classification accuracy, and the importance of errorsensitive training in binary networks (BNN and BSBNN) is underscored. In the last section, the performance of the CTbased network simulation is finally unveiled. SCbased XNOR multiplications are present in the classifier part of a convolutional neural network (CNN) architecture. Emulating multiplication via CT results in a faster training runtime than the counterpart with actual bitstream processing. Training a bitstream processing neural network with actual bitstreams (with bitbybit processing) shows an exponentially increasing runtime as N increases. Conversely, CTbased simulation provides linear training runtime independent of bitstream length.

ÖgeDevelopment of application specific transport triggered processors for postquantum cryptography algorithms(Graduate School, 20221018)Although initially only at the level of theoretical studies, many quantum computer development projects have been carried out in recent years. The promising results so far and the competition among companies indicate that number of such studies will increase even more. Quantum computers are not yet close to becoming a part of our daily lives in the near future. However, it is most likely that they will be used much more widely in certain areas. In particular, search, optimization and factorization problems can be solved by quantum computers much more faster than classical computers. Thus, operations such as big data analysis, machine learning or multivariate simulations can be performed in reasonable time. This is a valuable process for the advancement of science and technology. On the other hand, public key cryptography is under serious threat against quantum computer attacks. Because most of the commonly used algorithms are based on the hardness of the factorization problem. However, this may not be the case for quantum computers. Therefore, NIST initiated PostQuantum Cryptography Standardization Process to develop quantumresistant algorithms. Currently, this process has reached the final stage and there are four key encapsulation mechanisms and three digital signature methods. Just as important as the security of an algorithm is that it can be implemented and run efficiently. Especially in embedded systems, low power consumption and small chip area are fundamental requirements that must be met for a sufficient performance level. Applicationspecific processor designs are often needed to accomplish such demands. This study proposes suitable processor architectures for quantumresistant Latticebased Cryptography algorithms in the final stage of the NIST standardization process. For this purpose, it compares widely used Reduced Instruction Set Computing methodology with TransportTriggered Architecture. Strengths and weaknesses of the both techniques are analyzed through test results of open source sample designs. This work also suggests applicationspecific cores with various custom operations. In addition, the difficulties in processor development process and possible solutions are evaluated. In the introduction, the mathematical background of the latticebased algorithms and the principal computation approaches of the both architectures are presented. Several comparisons for various cores are shared in the next sections. After that, the design methodology of custom operations and obtained FPGA and ASIC results are given. Finally, possible future improvements are evaluated.