Stochastic bitstream-based vision and learning machines

thumbnail.default.alt
Tarih
2022-10-21
Yazarlar
Aygün, Sercan
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
Stochastic computing (SC), a paradigm that dates back to the 1960s, has re-emerged in the last decade by being a tool for emerging technology development. SC adopts unconventional data representation encoding scalars (X) into binary streams. Unlike conventional binary radix, the cumulative values of the logic-1s and logic-0s in the bitstream with a probability Px are evaluated free from the bit position. Thus, SC provides simple circuits of complex functions (e.g., multiplication with a single AND gate) and soft error-tolerant (e.g., robust to bit-flips) systems. However, latency inevitably occurs because SC expresses bitstreams in long sequences size of N (512 bits, 1024 bits, etc.) for high accuracy. Although several solutions for latency in SC hardware systems have been described in the literature, the software simulation of the SC framework falls behind. Therefore, this doctoral thesis proposes the general framework of software-based SC simulations considering both latency and memory issues. This study also discusses the systematic view of SC-based image processing and proposes a new concept, namely the bitstream processing binarized neural network. The dissertation begins with an introduction presenting a short literature check, the purpose of the thesis, and the hypothesis with the major and minor contributions. Then, the background part presents basic SC concepts such as bitstream structure, scalar encoding techniques, correlation, random bitstream generation, SC building block elements, and arithmetic. A cascaded multiplexer (MUX) optimization algorithm is proposed for scaled additions of multiple operands. Also, a comprehensive survey on vision and learning machines is presented, examining previous efforts and exploring the dissertations in the last decade. The software-driven SC is further discussed by proposing the utilization of a contingency table (CT). The generation and processing of the lengthy SC bitstreams pose the simulation runtime and memory occupation problems. Considering that applications require intensive arithmetic operations such as artificial neural network (ANN), the problems become significant. To tackle these, scalar-only processing of CT is proposed. CT is set by two input scalars (X1, X2), bitstream cross-correlation, and bitstream length (N). The main objective is to reach the desired logic output using only the scalar values instead of generating bitstreams and processing bit-by-bit with logical operators. The CT holds the cumulative values of four logic pairs, 11, 10, 01, and 00, for any overlapping bits of the two bitstream operands. These cumulative values denoted as a, b, c, and d, respectively, are the CT primitives. The correlation value of the two non-generated bitstream operands sets the prior CT primitive, a, based on the stochastic cross-correlation (SCC) metric. CT is established for maximum (SCC = 1, a is maximum), minimum (SCC = -1, a is minimum), or near-zero (SCC ≈ 0, a is based on the proposed algorithm) correlation. Zero correlation is vital for the accuracy of some SC-based arithmetic operations (e.g., multiplying bipolar encoded bitstreams by XNOR). Therefore, three methods are proposed to set the 'a' prior primitive for near-zero correlation with an algorithm. After the determination of a, the proposed formulas define b, c, and d. The linear combination of CT primitives obtains each logical operator. (e.g., XNOR is a + d.) The CT emulates the entire hardware system in software via the proposed model of random number generators by including SC's built-in random fluctuation error. The random source models imitate Sobol low-discrepancy sequences, linear-feedback shift register (LFSR), and the binomial distribution. Also, CT can simulate all 2^(2N) Cartesian combinations of two input bitstreams; therefore, there is no need for random sampling like in the Monte Carlo simulation. Next, several image processing techniques on behalf of SC-based vision machines are discussed. The first is the reinterpretation of SC-based mean filtering for noise removal. The second is the Prewitt edge detector, a case study for inspecting the different levels of hardware approximations based on the MUX scaling factor. The plain design (PD) exhibits remarkable edge detection performance in the case of excessive noise. The third technique is template matching to detect finder patterns of quick response (QR) codes in a noisy environment. Pattern matching is accomplished by feeding a single AND gate with bitstreams, and SC achieves slight outperformance compared to the deterministic counterpart. As the first study of its kind, to the best of our knowledge, the two other algorithms, bilinear interpolation and image compositing, are synthesized with SC. Bilinear interpolation is a method of scaling the dimensions of images. It is proven that the hardware equivalent can be a simple 4-to-1 MUX fed by bitstreams. The last technique, image compositing, outputs a new composited image by combining the background and foreground images. It is proposed to obtain the composited image utilizing a simple 2-to-1 MUX via SC. Both techniques are verified with the help of CT. Bilinear interpolation and image compositing highlight another contribution in validating different random number generator models (Sobol, LFSR, and binomial distribution) on behalf of CT. Finally, this dissertation focuses on the adaptation of SC to learning systems. First of all, a single stochastic neuron is designed, and higher classification accuracy is achieved in early epochs when the neuron is trained with noisy data. Then, the mathematics behind the learning procedure of conventional multi-layer neural network architecture is reviewed. A fully-bitstream processing binarized neural network (BSBNN) is proposed in comparison with the traditional binarized neural network (BNN) architecture. BNNs express network weights and neuron activations with one bit; however, this causes a fragile structure against soft errors such as bit-flips occurring on emerging hardware and memory technologies. In traditional BNN, the neuron activation is +1 or -1, decided by subtraction from a threshold value (+1 is logic-1 and -1 is logic-0 in hardware). In our proposal, the power of bipolar encoding is used, and the neuron output is decided by checking the majority-minority balance of logic-1s or logic-0s in the pre-activation (S) bitstream. Since this control is performed simultaneously during accumulation (counter) via a masking logic, an additional activation module is not required. Thus, less hardware resource utilization is achieved (30% per neuron basis). In addition, a more efficient architecture against bit-flip errors is provided. The proposed architecture proves superior robustness over the conventional fragile BNN regarding image- and weight-based bit-flips. All four different networks are then tested: BNN, BSBNN, stochastic computing-based neural network (SCNN), and full-precision neural network (FPNN), having no quantization. Considering image-based corruptions (contrast, Gaussian blur, fog, speckle noise, zoom blur, etc.), different training scenarios are compared with and without the awareness of corruption in network training. The proposed BSBNN architecture exhibits comparable classification accuracy, and the importance of error-sensitive training in binary networks (BNN and BSBNN) is underscored. In the last section, the performance of the CT-based network simulation is finally unveiled. SC-based XNOR multiplications are present in the classifier part of a convolutional neural network (CNN) architecture. Emulating multiplication via CT results in a faster training runtime than the counterpart with actual bitstream processing. Training a bitstream processing neural network with actual bitstreams (with bit-by-bit processing) shows an exponentially increasing runtime as N increases. Conversely, CT-based simulation provides linear training runtime independent of bitstream length.
Açıklama
Thesis(Ph.D.) -- Istanbul Technical University, Graduate School, 2022
Anahtar kelimeler
computer aided simulation, bilgisayar destekli benzetim, hardware, bilgisayar donanımları, circuit simulation, devre benzetimi, image classification, görüntü sınıflandırma, logic circuit, lojik devre
Alıntı