##
Sahada programlanabilir kapı dizileri ile lojik devre tasarımı

Sahada programlanabilir kapı dizileri ile lojik devre tasarımı

##### Dosyalar

##### Tarih

1996

##### Yazarlar

Sezer, Volkan

##### Süreli Yayın başlığı

##### Süreli Yayın ISSN

##### Cilt Başlığı

##### Yayınevi

Fen Bilimleri Enstitüsü

##### Özet

Sahada Programlanabilir Kapı Dizileri (Field Programmable Gate Array, FPGA) genel olarak Boole fonksiyonlarının gerçekleştirildiği bloklar, bu bloklar arasındaki bağlantıları sağlayan bağlantı kanalları ve giriş-çıkış bloklarından oluşur. Fonksiyonların gerçekleştirildiği bloklarda yer alan ve "Look-up Table (LUT)" adı verilen devreler m (m>2) değişkenli her Boole fonksiyonunu gerçekleyebilirler. Verilen bir fonksiyonun değişken sayısının m veya m' den az olması sonucunda bu fonksiyon bir tane m-LUT ile gerçeklenebilir. Değişken sayısının m' den büyük olması durumunda fonksiyonun birden fazla m-LUT ile gerçeklenmesi gerekir. LUT-tabanlı FPGA' lar kullanılarak yapılan tasarımlarda amaç devrelerin minimum sayıda LUT kullanılarak gerçeklenmesidir. Bu amaçla ilk olarak, verilen fonksiyonlar, herbiri bir m-LUT ile gerçeklenebilecek alt fonksiyonlara ayrıştırılırlar. İkinci işlem olarak bazı uygun alt fonksiyonlar aynı LUT'lar ile gerçeklenerek toplam LUT sayısı azaltılır. Bu tezde sayısal sistem tasarımında kullanılan Sahada Programlanabilir Kapı Dizilerinin mimarileri, programlama teknolojileri ve özellikleri birbirleriyle karşılaştırılarak incelenmiştir. Piyasada mevcut olan çeşitli FPGA tiplerinin genel yapılan anlatılmıştır. Ayrıca, Boole fonksiyonlarını minimum sayıda LUT kullanarak bir FPGA'ya yerleştiren "mis-fpga", "chortle-crf, "VISMAP" ve "TechMap" yöntemleri incelenerek bazı hususlar açıklığa kavuşturulmuştur. Bu tezde "mis-fpga" yönteminin birinci aşaması olan "Ayrıştırma" işleminde kullanılan fonksiyonel ayrıştırma, kurulama ve kofaktör yöntemleri örneklerle karşılaştırılmış ve n+k değişkenli fonksiyonların n değişkenli Karnaugh diyagramı ile indirgenebüme- sinden faydalanılarak, fonksiyonel ayrıştırmanın Karnaugh diyagramı ile yapılmasına ilişkin bir yol önerilmiştir. Ayrıca yöntemin ikinci aşamasını oluşturan blok sayısının azaltılması işleminde ilk adım olan m-gerçeklenebilir süper düğümlerin belirlenmesi problemi için bir algoritma geliştirilmiş ve ilk adımda elde edilen süper düğümlerden minimum elemanlı bir örtünün bulunmasına ilişkin bir yol önerilmiştir.

Modern VLSI technology has made it possible to implement powerful digital circuits at reasonably low costs. The traditional VLSI design techniques, however, require extensive manufacturing efforts taking several months from beginning to end. This results in a high cost for each unit, unless large volumes are produced, due to the high overhead to begin production of such chips. On the other hand, in the electronic industry, it is vital to get new products to the market in the shortest time. Therefore, reduction of the design, development, production and testing times is quite crucial. Furthermore, it is important that the financial risk incurred in the development of a new product be limited so that more new ideas can be prototyped. As a solution of these requirement, programmable logic arrays have been introduced, among which Field Programmable Gate Arrays ( FPGA's) are the most promising. Like an MPGA, an FPGA consists of an array of programmable logic blocks (PLB's) that can be interconnected in a general way. Like a PAL, the interconnections between these blocks are also user-programmable. FPGA's were introduced in 1985 by the Xilinx Company. Since then, many different FPGA' s have been developed by a number of companies. A typical FPGA consists of a two-dimensional array of logic blocks that can be connected by general interconnection resources. The interconnect comprises segment of wire, where the segment may be of various lengths. Present in the interconnect are programmable switches that serve to connect the logic blocks to the wire segments or one segment to another. Logic circuits are implemented in the FPGA by partitioning the logic into individual logic blocks and then interconnecting the blocks as required via the switches. FPGA's can be used effectively in a wide variety of applications. In comparison with MPGA' s, they have two significant advantages: FPGA's have lower prototype costs and shorter production times. The two main disadvantages of FPGA's, compared to MPGA's, are their relatively low speed of operation and lower logic density ( the amount of logic that can be implemented in a single chip). The propagation delay of an FPGA is adversely affected by the inclusion of programmable switches, which have significant resistance and capacitance, in the connections between logic blocks. Logic density is decreased because the programmable switches and associated programming circuitry require a great deal of chip area compared to the metal connections in an MPGA. Over the last few years, several companies have introduced a number of different types of FPGA's. While each product has unique features, they can all be classified into one of four categories: symmetrical array, row-based, hierarchical PLD and sea- of gates. There are a number of different programming technology of implementing a programming element. Programming technologies that are currently in use commercial products are: static RAM cells, anti-fuses, EPROM transistors and EEPROM transistors. The static RAM programming technology is used in FPGAs which are produced by several companies: Algotronix, Concurrent Logic, Plessey Semiconductors and Xilinx. In these FPGAs, programmable connections are made using pass-transistors, transmission gates or multiplexers that are controlled by SRAM cells. Since the SRAM is volatile, these FPGAs must be configured each time the power is applied to the chip. This implies that a system that includes such chips must have some sort of permanent storage mechanism for the RAM cell bits, such as a ROM or a disk. The major advantage of this technology is that it provides an FPGA that can be re-configured (in-circuit) very quickly. Anti-fuse programming technology is used in FPGAs offered by Actel Corp., QuickLogic and Crosspoint Solutions. An anti-fuse normally resides in a high-impedance state but can be "fused" into a low- impedance state when programmed by a high voltage. EPROM programming technology is used in the FPGAs manufactured by Altera Corp. and Plus Logic. This technology is the same as that used in EPROM memories. One advantage of EPROM transistors is that they are re-programmable but do not require external storage. However, unlike SRAM, EPROM transistors can not be re-programmed in-circuit. The EEPROM approach which is used in the FPGAs offered by Advanced Micro Devices is similar to the EPROM technology except that EEPROM transistors can be re-programmed in-circuit. While each of these technologies are quite different, the programming elements all share the property of being configurable in one of two states: ON or OFF. Depending on the application in which the FPGA is to be used, it may also be desirable for the programming element to possess other features. For example, a programming element that is non-volatile might be attractive, as well as an element that is re-programmable. Re-programmable elements make it possible to re-configure the FPGA, perhaps without even removing it from circuit board. There are two popular categories of FPGA block structures, namely Look-up Table (LUT) based and multiplexor based; the resulting architectures are called LUT- based and MUX-based architectures, respectively. The basic block of an LUT architecture is a look-up table that can implement any Boolean function of up to m inputs. For a given LUT architecture, m is a fixed number. In commercial architectures, m is typically between 3 and 6. An m-LUT is typically implemented by static random access memory (SRAM) that has m address lines and 1 data line. In commercial LUT-based architectures, each basic block has vni one or more LUTs, along possibly with other logic elements (such as flip-flops, fast carry logic, etc.). In the MUX-based architectures, the basic block is a configuration of multiplexors, with possibly a few additional logic gates such as ANDs and ORs. An important problem in synthesis by using FPGAs is to minimize the cost of a design, where the cost measured by the number of LUTs needed. Minimizing the total number of LUTs allows the implementation of larger logic networks with the fixed number of m-LUT in a given FPGA. In order to solve this problem, first m-infeasible functions should be made m-feasible. Each node function in the resulting network can then be realized by m-LUT. However, it may be possible to reduce the number of m-LUTs if the nodes in the resulting network are small, i.e. more than one can be implemented within the same m-LUT. This called block count minimization (BCM). There exist a number of LUT technology mappers. All these programs realise a Boolean function by the use of m-LUTs, attempting to minimize either the total number of LUTs or the number of levels of LUTs in the final circuit. Minimizing the total number of LUTs allows the implementation of large number Boolean function with the fixed number of look-up tables available in a given FPGA, and minimizing the number of levels improves the speed-performance of circuits. In this thesis, LUT technology mappers that are "mis-fpga", "chortle-crf ', "VISMAP" and "TechMap" are studied. The mis-fpga technology mapper minimizes the number of m-LUTs required to implement a Boolean function in two phases. The first phase decomposes the original function to ensure that every node can be implemented by a single m-LUT and the second phases uses heuristic solution to a covering problem to reduce the number of LUTs in the final circuit. In the first phase, three approaches are used to decompose functions that cannot be implemented by a single LUT. The first approach is functional decomposition that is based on Roth-Karp decomposition, the second adapts the bin-packing approach and the third is based on Shannon cofactoring. *o- The fist systematic study on decomposition was done by Ashenhurst. He characterized the existence of a simple disjoint decomposition of a function. This work could not be used for functions with more than 10-15 inputs, since it required a construction of a decomposition chart, a modified form of the truth table for a function. Ashenhurst gave necessary and sufficient condition for existence of a simple decomposition of a function f of n variables: The simple disjoint decomposition exists if and only if the corresponding decomposition chart has at most two distinct column patterns. Not every function has a simple disjoint decomposition. Roth and Karp extended the decomposition theory of Ashenhurst by characterizing a general disjoint decomposition, which is of the following form: f(xı,..,xs,yı,..,yn.s)=g(aı(xı,..,xs), a2(xı,..,xs),..., at(xı,..,xs), yi,..,y".s). IX The functional decomposition can be implemented efficiently using BDDs and Karnaugh map. Cube-packing as a method of decomposition for LUT architectures was first suggested in chortle-crf. The basic idea is to approximate the problem of decomposing a function as that of decomposing an SOP (sum of products) of the function. Cube- packing uses bin-packing. The objective is to pack all items using a minimum number of bins without violating the capacity of any bin. Here, the used capacity of a bin is the sum of the weights of the items in the bin. If each cube in the SOP of the function f is treated as an item with weight equal to its literal count, and an LUT as a bin with capacity m, the problem of decomposing the SOP of f into a minimum number of m- LUTs can be seen as a bin-packing problem. Cofactoring is a special case of functional decomposition. Cofactoring a function f(xi,..,xn) with respect to x; can also be done using a functional decomposition on f with the input partition ({xı,...,Xi.ı,Xi+ı,..,x"}|{xi}). After decomposition, m-feasible functions are obtained, which can be implemented straightaway and the LUTs by mapping each node to an LUT. This strategy, however, yields sub-optimal results. There is a transformation that reduce the number of feasible nodes called covering. The covering problem can be stated as: given m-feasible Boolean network N, iteratively collapse nodes into their fanouts such that the resulting network N is m-feasible and the number of nodes in N is minimum. One way to solve this problem is given in the following. l)Enumerate all possible m-feasible supernodes ( A supernode corresponding to a node n of the network N is a directed subgraph S of N, with a single root n such that S does not contain any primary input node of N). 2) Select a minimum subset of these supernodes that covers the network. In this thesis, one algorithm is offered for enumerating all possible m-feasible supernodes and one way is offered for selecting a minimum subset of these supernodes that covers the network. Chortle-crf maps a Boolean function into a circuit of m-LUTs. Its objective is to minimize the number of LUTs in the final circuit. The original network is first partitioned into a forest of trees and then each tree is separately mapped into a subcircuit of m-LUTs. The final circuit is then assembled from the subcircuits implementing the trees. The major innovation of Chortle-crf is that it simultaneously addresses the decomposition and covering problem using a bin-packing approximation algorithm. The way bin-packing is implemented in Chortle-crf is that it treats the same input going to two gates as different. Better results may be obtained if it is recognized that the same input feeds the two gates. However, placing cubes that share variables in the same LUT may not always yield the optimum solution. Furthermore, there may be several product terms that share variables, but considered together may not fit in the same LUT. So the question of which groups of product terms should be merged into a single LUT arises. Since the bin-packing algorithm is very fast, the solution chosen by chortle-crf for these problems is to run the algorithm exhaustively on all possible cases, i.e. no merging and all possible mergings of cubes with shared variables. The combination that leads to fewest LUTs is selected. The VISMAP technology mapper focuses only on the block count minimization problem. It assumes that appropriate decomposition has been performed and an m- feasible network is available. The approach is similar to that of mis-fpga. The main difference is that VISMAP does not generate all the supernodes, but a subset. The basic idea is to label each edge of the network as either visible or invisible. An edge is called visible if it will be implemented as a wire between two LUTs in the circuit. An invisible edge is absorbed within an LUT. Note that a visible edge corresponds to an input to a supernode in the final mapped circuit and an invisible edge to an edge absorbed within a supernode in the mapped circuit. If the network has a large number of edges, it is computationally expensive to go through all possible labelings. The network is then partitioned into sub-networks. The problem is solved optimally on these sub-networks and the solutions are glued together. The TechMap uses a AND-OR decomposition. In the covering phase, a compatibility graph G(N) is formed whose vertices correspond to the fanins of N. Two fanins i and j ofN are said to be compatible if |a(i)ua(j)|

Modern VLSI technology has made it possible to implement powerful digital circuits at reasonably low costs. The traditional VLSI design techniques, however, require extensive manufacturing efforts taking several months from beginning to end. This results in a high cost for each unit, unless large volumes are produced, due to the high overhead to begin production of such chips. On the other hand, in the electronic industry, it is vital to get new products to the market in the shortest time. Therefore, reduction of the design, development, production and testing times is quite crucial. Furthermore, it is important that the financial risk incurred in the development of a new product be limited so that more new ideas can be prototyped. As a solution of these requirement, programmable logic arrays have been introduced, among which Field Programmable Gate Arrays ( FPGA's) are the most promising. Like an MPGA, an FPGA consists of an array of programmable logic blocks (PLB's) that can be interconnected in a general way. Like a PAL, the interconnections between these blocks are also user-programmable. FPGA's were introduced in 1985 by the Xilinx Company. Since then, many different FPGA' s have been developed by a number of companies. A typical FPGA consists of a two-dimensional array of logic blocks that can be connected by general interconnection resources. The interconnect comprises segment of wire, where the segment may be of various lengths. Present in the interconnect are programmable switches that serve to connect the logic blocks to the wire segments or one segment to another. Logic circuits are implemented in the FPGA by partitioning the logic into individual logic blocks and then interconnecting the blocks as required via the switches. FPGA's can be used effectively in a wide variety of applications. In comparison with MPGA' s, they have two significant advantages: FPGA's have lower prototype costs and shorter production times. The two main disadvantages of FPGA's, compared to MPGA's, are their relatively low speed of operation and lower logic density ( the amount of logic that can be implemented in a single chip). The propagation delay of an FPGA is adversely affected by the inclusion of programmable switches, which have significant resistance and capacitance, in the connections between logic blocks. Logic density is decreased because the programmable switches and associated programming circuitry require a great deal of chip area compared to the metal connections in an MPGA. Over the last few years, several companies have introduced a number of different types of FPGA's. While each product has unique features, they can all be classified into one of four categories: symmetrical array, row-based, hierarchical PLD and sea- of gates. There are a number of different programming technology of implementing a programming element. Programming technologies that are currently in use commercial products are: static RAM cells, anti-fuses, EPROM transistors and EEPROM transistors. The static RAM programming technology is used in FPGAs which are produced by several companies: Algotronix, Concurrent Logic, Plessey Semiconductors and Xilinx. In these FPGAs, programmable connections are made using pass-transistors, transmission gates or multiplexers that are controlled by SRAM cells. Since the SRAM is volatile, these FPGAs must be configured each time the power is applied to the chip. This implies that a system that includes such chips must have some sort of permanent storage mechanism for the RAM cell bits, such as a ROM or a disk. The major advantage of this technology is that it provides an FPGA that can be re-configured (in-circuit) very quickly. Anti-fuse programming technology is used in FPGAs offered by Actel Corp., QuickLogic and Crosspoint Solutions. An anti-fuse normally resides in a high-impedance state but can be "fused" into a low- impedance state when programmed by a high voltage. EPROM programming technology is used in the FPGAs manufactured by Altera Corp. and Plus Logic. This technology is the same as that used in EPROM memories. One advantage of EPROM transistors is that they are re-programmable but do not require external storage. However, unlike SRAM, EPROM transistors can not be re-programmed in-circuit. The EEPROM approach which is used in the FPGAs offered by Advanced Micro Devices is similar to the EPROM technology except that EEPROM transistors can be re-programmed in-circuit. While each of these technologies are quite different, the programming elements all share the property of being configurable in one of two states: ON or OFF. Depending on the application in which the FPGA is to be used, it may also be desirable for the programming element to possess other features. For example, a programming element that is non-volatile might be attractive, as well as an element that is re-programmable. Re-programmable elements make it possible to re-configure the FPGA, perhaps without even removing it from circuit board. There are two popular categories of FPGA block structures, namely Look-up Table (LUT) based and multiplexor based; the resulting architectures are called LUT- based and MUX-based architectures, respectively. The basic block of an LUT architecture is a look-up table that can implement any Boolean function of up to m inputs. For a given LUT architecture, m is a fixed number. In commercial architectures, m is typically between 3 and 6. An m-LUT is typically implemented by static random access memory (SRAM) that has m address lines and 1 data line. In commercial LUT-based architectures, each basic block has vni one or more LUTs, along possibly with other logic elements (such as flip-flops, fast carry logic, etc.). In the MUX-based architectures, the basic block is a configuration of multiplexors, with possibly a few additional logic gates such as ANDs and ORs. An important problem in synthesis by using FPGAs is to minimize the cost of a design, where the cost measured by the number of LUTs needed. Minimizing the total number of LUTs allows the implementation of larger logic networks with the fixed number of m-LUT in a given FPGA. In order to solve this problem, first m-infeasible functions should be made m-feasible. Each node function in the resulting network can then be realized by m-LUT. However, it may be possible to reduce the number of m-LUTs if the nodes in the resulting network are small, i.e. more than one can be implemented within the same m-LUT. This called block count minimization (BCM). There exist a number of LUT technology mappers. All these programs realise a Boolean function by the use of m-LUTs, attempting to minimize either the total number of LUTs or the number of levels of LUTs in the final circuit. Minimizing the total number of LUTs allows the implementation of large number Boolean function with the fixed number of look-up tables available in a given FPGA, and minimizing the number of levels improves the speed-performance of circuits. In this thesis, LUT technology mappers that are "mis-fpga", "chortle-crf ', "VISMAP" and "TechMap" are studied. The mis-fpga technology mapper minimizes the number of m-LUTs required to implement a Boolean function in two phases. The first phase decomposes the original function to ensure that every node can be implemented by a single m-LUT and the second phases uses heuristic solution to a covering problem to reduce the number of LUTs in the final circuit. In the first phase, three approaches are used to decompose functions that cannot be implemented by a single LUT. The first approach is functional decomposition that is based on Roth-Karp decomposition, the second adapts the bin-packing approach and the third is based on Shannon cofactoring. *o- The fist systematic study on decomposition was done by Ashenhurst. He characterized the existence of a simple disjoint decomposition of a function. This work could not be used for functions with more than 10-15 inputs, since it required a construction of a decomposition chart, a modified form of the truth table for a function. Ashenhurst gave necessary and sufficient condition for existence of a simple decomposition of a function f of n variables: The simple disjoint decomposition exists if and only if the corresponding decomposition chart has at most two distinct column patterns. Not every function has a simple disjoint decomposition. Roth and Karp extended the decomposition theory of Ashenhurst by characterizing a general disjoint decomposition, which is of the following form: f(xı,..,xs,yı,..,yn.s)=g(aı(xı,..,xs), a2(xı,..,xs),..., at(xı,..,xs), yi,..,y".s). IX The functional decomposition can be implemented efficiently using BDDs and Karnaugh map. Cube-packing as a method of decomposition for LUT architectures was first suggested in chortle-crf. The basic idea is to approximate the problem of decomposing a function as that of decomposing an SOP (sum of products) of the function. Cube- packing uses bin-packing. The objective is to pack all items using a minimum number of bins without violating the capacity of any bin. Here, the used capacity of a bin is the sum of the weights of the items in the bin. If each cube in the SOP of the function f is treated as an item with weight equal to its literal count, and an LUT as a bin with capacity m, the problem of decomposing the SOP of f into a minimum number of m- LUTs can be seen as a bin-packing problem. Cofactoring is a special case of functional decomposition. Cofactoring a function f(xi,..,xn) with respect to x; can also be done using a functional decomposition on f with the input partition ({xı,...,Xi.ı,Xi+ı,..,x"}|{xi}). After decomposition, m-feasible functions are obtained, which can be implemented straightaway and the LUTs by mapping each node to an LUT. This strategy, however, yields sub-optimal results. There is a transformation that reduce the number of feasible nodes called covering. The covering problem can be stated as: given m-feasible Boolean network N, iteratively collapse nodes into their fanouts such that the resulting network N is m-feasible and the number of nodes in N is minimum. One way to solve this problem is given in the following. l)Enumerate all possible m-feasible supernodes ( A supernode corresponding to a node n of the network N is a directed subgraph S of N, with a single root n such that S does not contain any primary input node of N). 2) Select a minimum subset of these supernodes that covers the network. In this thesis, one algorithm is offered for enumerating all possible m-feasible supernodes and one way is offered for selecting a minimum subset of these supernodes that covers the network. Chortle-crf maps a Boolean function into a circuit of m-LUTs. Its objective is to minimize the number of LUTs in the final circuit. The original network is first partitioned into a forest of trees and then each tree is separately mapped into a subcircuit of m-LUTs. The final circuit is then assembled from the subcircuits implementing the trees. The major innovation of Chortle-crf is that it simultaneously addresses the decomposition and covering problem using a bin-packing approximation algorithm. The way bin-packing is implemented in Chortle-crf is that it treats the same input going to two gates as different. Better results may be obtained if it is recognized that the same input feeds the two gates. However, placing cubes that share variables in the same LUT may not always yield the optimum solution. Furthermore, there may be several product terms that share variables, but considered together may not fit in the same LUT. So the question of which groups of product terms should be merged into a single LUT arises. Since the bin-packing algorithm is very fast, the solution chosen by chortle-crf for these problems is to run the algorithm exhaustively on all possible cases, i.e. no merging and all possible mergings of cubes with shared variables. The combination that leads to fewest LUTs is selected. The VISMAP technology mapper focuses only on the block count minimization problem. It assumes that appropriate decomposition has been performed and an m- feasible network is available. The approach is similar to that of mis-fpga. The main difference is that VISMAP does not generate all the supernodes, but a subset. The basic idea is to label each edge of the network as either visible or invisible. An edge is called visible if it will be implemented as a wire between two LUTs in the circuit. An invisible edge is absorbed within an LUT. Note that a visible edge corresponds to an input to a supernode in the final mapped circuit and an invisible edge to an edge absorbed within a supernode in the mapped circuit. If the network has a large number of edges, it is computationally expensive to go through all possible labelings. The network is then partitioned into sub-networks. The problem is solved optimally on these sub-networks and the solutions are glued together. The TechMap uses a AND-OR decomposition. In the covering phase, a compatibility graph G(N) is formed whose vertices correspond to the fanins of N. Two fanins i and j ofN are said to be compatible if |a(i)ua(j)|

##### Açıklama

Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Sosyal Bilimler Enstitüsü, 1996

##### Anahtar kelimeler

Lojik devre,
Logic circuit