Gpu Üzerinde Yazılım Tabanlı Anten Gerçeklenmesi
Gpu Üzerinde Yazılım Tabanlı Anten Gerçeklenmesi
Dosyalar
Tarih
2015-06-18
Yazarlar
Bakırtaş, Abdullah
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Fen Bilimleri Enstitüsü
Institute of Science and Technology
Institute of Science and Technology
Özet
Bu tezde, kablosuz haberleşme, uydu haberleşmesi, mobil iletişim, RADAR uygulamaları ve yeni nesil haberleşme sistemleri için genel amaçlı, programlanabilir, yazılım tabanlı anten (YTA) sistemi önerilmektedir. YTA siteminin bileşenleri, resiprok ve yönsüz antenler, alçaltıcı veya yükseltici RF elemanları, sayısallaştırıcı ve yazılım ile kontrol edilebilir demet katsayıları üreten ve işaretleri biçimlendiren GPU'dan oluşur. Bu tezde, anten elemanları, RF bloğu ve sayısallaştırıcı birimlerinin ideal olduğu kabul edilmiş ve bu bileşenlerin üzerinde herhangi bir çalışma yapılmamıştır. Tez kapsamında sunulan YTA sisteminin yazılım modülü, GPU donanımı üzerinde koşmakta ve CPU'lu YTA sistemi ile karşılaştırmalı olarak değerlendirilmektedir. Simülasyonlar ise MATLAB ile üretilen temel bant sentetik işaretler yardımıyla yapılmaktadır. YTA sistemi, ayrık demet biçimlendirme metodunun farklı katsayı dizileri ile çok defa kombine edilmesiyle tasarlanmıştır. Kombinasyon tamamen yazılımsal olarak yapılmaktadır. YTA’nın programlanabilir olma özelliği, uyarlamalı ve akıllı algoritmaları gerçeklemeye imkan verdiği gibi esnek ve başarılı sistemlerin inşa edilmesine de olanak sağlamaktadır. YTA sistemi, aynı anda birden çok ışıma yönüne kanalize olabilme özelliği ile bütün kanalların verilerini gerçek zamanlı işleme yeteneğine sahiptir. YTA’nın işlem yükünün ve gerçek-zamanda çalışma zorunluluğu olan uygulamaların gereksinimlerinin karşılanması için GPU ve CPU kullanan çözümler bu tezde incelenmektedir. CPU için seri bir YTA algoritması, GPU için ise paralelleştirilmiş YTA algoritması tasarlanmıştır. YTA algoritmasının GPU üzerindeki işlem süresi hesaplanırken üç farklı süre tanımlanmıştır. Birincisi CPU tarafından verilerin GPU'ya yüklenme süresi (tu), ikincisi GPU'nun verileri işleme süresi (te), üçüncüsü ise GPU tarafından işlenen verilerin CPU'ya geri gönderilme süresidir (td). Toplam süre, seri olarak yapılan veri yükleme, veri işleme ve veri indirme sürelerinin toplamı olarak tanımlanmaktadır. CPU için ise toplam süre sadece veri işleme süresini içermektedir. GPU üzerindeki YTA algoritması, hızlı veri işlemesine rağmen verileri yükleme ve verileri geri indirme noktasında, yüksek hacimli transferler yapıldığından ve donanımlar arasında iletim bant genişliğinin sınırlamalarından dolayı yavaş kalmıştır. Buna rağmen, simülasyon sonuçları toplam sürenin çoğunlukla gerçek zaman eşiğinin altında kaldığını göstermektedir. Toplam harcanan sürede iyileştirme yapmak için çözüm olarak asenkron senaryo önerilmiş; fakat bu tez kapsamında asenkron senaryo simüle edilmemiştir. GPU üzerinde koşan algoritma daha hızlı performans gösterdiği ve bu sayede geniş bantlı verileri bile gerçek zaman eşiğinin altında işleyebildiği gözlemlenmiştir. CPU algoritması ise yavaş kalmış ve fazla kaynak harcamasına rağmen verileri gerçek zamanda işleyememiştir. GPU'lar yüksek hacimdeki verilerin hızlı bir şekilde işlenmesine olanak sağlarmaktadır. Veri hacminin büyüklüğü, GPU'da koşan YTA algoritmasının performansını da etkileyen bir unsurdur. Bu nedenle, YTA algoritmasının işleyeceği verinin büyüklüğü bir çerçeve boyutu olarak tanımlanmıştır. Çerçeve boyutunun mertebesi önce kaba bir yaklaşımla belirlenmiş, belirlenen değer yakınında çerçeve boyutunun hassas değişimlere tepkisi ölçülmüştür. GPU algoritmasının performansı, çerçeve boyutunun artması ile bir noktaya kadar artmış sonrasında ise fazlaca değişiklik göstermemiştir. Performansının kötü olduğu çoğu durumda bile verileri gerçek-zamanda işleyebilmiştir. Bu durum GPU ile gerçeklenen YTA sisteminin başka sistemlere entegrasyonun ve adaptasyonun hızlı ve başarılı olacağının göstergesidir. Sonuç olarak; GPU üzerinde koşan YTA algoritmasının CPU üzerinde koşan algoritmaya göre önemli derecede toplam işlem süresini azalttığı gözlemlenmiştir. GPU üzerindeki YTA sisteminin ölçeklenebilir olması nedeniyle, GPU'nun gelişme hızı ve maliyet etkin bir çözüm olması da göz önüne alındığında, yeni nesil haberleşme sistemlerinde ve radar uygulamalarında yaygın bir kullanım alanına sahip olacağı düşünülmektedir.
In recent years, digital circuits have become more popular compared to analog circuits since it is possible to implement complex systems with lower cost and effort. Field upgradability, scalability, energy efficiency are the most attractive advantages for signal processing used in digital solutions. However, complex digital systems require more advanced signal processing capability. Beamforming in telecommunication systems had been traditionally implemented using analog systems. Nowadays, beamforming is usually implemented on a variety of platforms including specialized DSP’s, commodity CPU’s, and FPGA’s. Beamforming implementation running on GPU processing is a new trend on this area since the parallel processing capability of GPUs allows implementing advanced beamforming algorithms. A software-defined antenna (SDA) system allows digitally constructed multiple beam responses with different orientation. Construction of these beams is realized on software and provides the flexibility of implementing alternative algorithms, field upgrades, and reduced dependency on hardware. However, digital beamforming comes with the price of high degree of computational complexity. The main contribution of this thesis is to implement an SDA system by leveraging GPUs. In order to demonstrate the performance improvement provided by a parallel algorithm optimized for GPUs, we also implemented multiple beam structure using a serial algorithm for high clock frequency CPUs. SDA is commonly known as smart antenna or adaptive antenna array, because SDA allows implementation of smart algorithms. Smart antenna technology or adaptive antenna array aim towards wireless communication, mobile communication, satellite communication, Electronic Warfare applications with emphasis on RADAR and new generation of communication systems. Smart antenna provides higher dynamic range, multi-path variation, lower interference, increasing capacity and data rate, and smart-jamming. In mobile communication systems, smart antennas have been suggested to solve interference problem between users and increase coverage capacity. Smart antenna have also been suggested for new generation communication systems which require space-time multiplexing access (SDMA), because antenna array have capability to divide space. Another field of smart antenna on satellite communication is directed-beam provide gain for signals between satellite to ground station. Adaptive directed-beam can also be used to follow satellites which display asynchronous behavior to the ground. Moreover, adaptive antenna array provides electronic scanning for RADAR application in civil or military applications. Using antenna array for electronic scanning enables fast and efficient solution. Smart antenna or adaptive antenna array have been improved toward GPUs parallel processing capability. We must emphasize GPUs provide cost-effective solution about SDA. The rapid evolution of GPUs which is originally fueled by gaming enthusiasts, got attention from scientific fields with computational problems because of its high parallel architecture and as a result GPUs are extensively used in computational finance, defense and intelligence, machine learning, manufacturing, media and entertainment, safety and security fields. Today there are GPUs specifically designed for computational problems, leveraging similar architectures with gaming GPUs. In telecommunication field, some algorithms of digital signal processing definitely require parallel structure is derived to GPUs. An example is receiver antenna model of Global Positioning System (GPS) which require to direct 4 satellite at the same time. Its performance depends on directed-beam resolution and reduction of interferences. Researchers from Stanford University implemented and simulated the antenna model shows efficient performance on GPUs[5]. This is good scenario to sign GPUs on SDA's fields. Another example about new generation base station which require dividing space to increase capacity and data rate. Some researches show that SDA system on GPU allows reduced radiation power of base stations. There is a misconception around GPUs as being cheaper alternatives for FPGAs because of their highly parallelized architecture. However, FPGAs are more suited for hard real-time problems with no tolerance to latency or jitter in processing times. They also provide very low level control on hardware behavior which enables design of many core soft processors, GPIO capabilities and standalone operation. GPUs, on the other hand, work as accelerators for CPUs, providing far greater efficiency comparing to CPUs at parallelizable tasks. FPGAs too, can be accelerators for CPUs, however, being both an accelerator and a gaming device, GPUs provide easier programming paradigms and connection interfaces. They also provide cost effectiveness, because of the economies-of-scale. Graphic cards have high computational capacity to compute a scenario. To use this capacity code blocks called “kernel” are initiated by host CPUs. Kernels load data by high speed memory transfers from host to kernel and fetches the results from kernel to host. This cooperation of GPU and CPU is called hybrid programming in the literature. As is seen, time spent on a computation on GPU consists of the duration of data transfer from CPU to GPU (tu), time spent on execution of kernel (te), and duration of data transfer for results from GPU to CPU (td). These three times are the main parameters observed for measuring the performance of a specific algorithm on GPU. In the case of IO bound algorithms with large amount of data to copy from host CPU to GPU, tu and td may be dominant. In order to utilize memory bandwidth of the GPU efficiently, there are asynchronous copy mechanisms, leveraging hardware accelerated DMA transfers while doing computations on the GPU, however in the simulations these mechanisms are not implemented and performance analysis is provided as GPU process only duration (te) and GPU total duration (tu + te + td). GPUs have different memory types, each having differing characteristics. Global memory has high capacity bandwidth and it communicates to host from PCI Express slot. Its performance affects tu and td. There are other global memories which are called constant and texture memory. These memories are read only and they have limited sources. Programmers must pay attention on memory types while developing GPU based algorithms. Another memory type is shared memory which is also called block memory. Any thread in same block can access shared memory. The last memory type is register which is private memory for threads. Registers can only be accessed by their own thread. Registers have the smallest latency among the memory types. There are two mainstream programming languages for GPUs: NVIDIA specific CUDA C (CUDA) and hardware independent OpenCL. Since CUDA lets the developers to have lower level access to GPU resources, CUDA is selected as the primary programming language, in the scope of this thesis. Proposed SDA is derived from a receiver scenario (which could easily be a transceiver one). Basic components of the SDA are a reciprocal and omni-directional antenna array, a down-converter, RF elements, a set of Analog to Digital Converters (ADC) and a software-defined beam-former module. Consisting of the thesis excepts to design and analyze all RF components. Only beam-former module is designed and realized on CPUs and GPUs. It is assumed that there is no reflection, refraction, scattering and noise in free space and SDA components. The antenna array is modeled as linear antennas; however, other antenna models can be introduced as well. Beam responses are calculated using by phase steering coefficients. Finally, synthetic base-band signals were introduced on MATLAB to observe simulation results. In the scope of this thesis, a baseband signal in receiver side and an SDA receiver system are modeled and beam responses are derived from the baseband signal model. Using these models, SDA is directed to a particular angle and a very sharp beam is created. In order to create multiple beams, multiple beam coefficient sets are created. Another parameter affecting the beam created by the SDA is the number of antenna elements. Thesis, elaborates on the effect of these parameter in the following sections. With careful selection of beam coefficients and powerful computation devices like GPUs, an SDA is capable of providing multiple sharp beams using the same set of antennas. Computational complexity of digital beamforming depends on sampling frequency, size of antenna elements and sampling resolution. SDA system has number of created beam size times more computing complexity than unique digital beamforming. It is extracted from analytical equation. To compute response of SDA have two algorithms which are designed as GPU algorithm and CPU algorithm. GPU algorithm have one cycle less than CPU algorithm and the cycle have biggest counter. It shows that GPU algorithm is the good solution of parallel processing. However, every digital circuit has limited sources so we cannot expect to eliminate this cycle duration at simulation result. In this work, two types of simulation introduced to perform CPU and GPU algorithms on their devices. One shows only processing time (te) and the other simulation shows total processing time (tu+te+td) for GPU versus CPU. According to simulation result, GPU perform much more fast respond from CPU. Its details are executed in section 4. GPU based algorithm computes one frame at a time. To improve GPU algorithm performance, frame size must be analyzed to find the optimum frame size. First, coarse frame size is selected by power of 2 (2n) and GPU algorithm is simulated. According to simulation result, increasing frame size performed better response until a point which is deep simulation time, after this point frame size does not affect the performance. Fine frame size is predicted with a second simulation. In this simulation, frame size is changed linearly with one-sample steps. Initial frame size is selected by the best coarse frame length. According to this simulation result didn't show significant changes. Thus, we show relation of frame size and processing time. Flexible frame sizes of the algorithm provides further integration possibilities and comes with trade-offs. Larger frame size provides higher performance (e.g. sharper beam, increased number of beams), higher energy efficiency (e.g. same amount of data, same number of clock cycles, means the same energy required but systems creates better beams) but it introduces latency on the system (e.g. more data required to process and more data is collected within a larger time frame), on the other hand, small frame lengths provide small delays but lower performance, thus lower energy efficiency. In section 6, for the simulation work, software is implemented to work on a Windows based PC. Software has an interface for SDA pattern visualization and required simulation data can be calculated on this interface. Data of antenna array have to fix determined name and format (see figure 6.2). The software processes the calculated data within a small time frame-short enough for real-time- for required duration. Before using this software, every hardware has to check for qualification for example brand name, GPU compute capability and other CUDA properties. As a result; SDA on GPU have significant improvement to reduce total processing time comparing to SDA on CPU. Moreover, the scalable algorithm structure would provide further performance gains with future GPUs, leading to easily upgradable and cost effective solutions. In conclusion, SDA have significant performance on GPU. GPU algorithm processing this antenna is much more faster than CPU algorithm. Moreover, GPUs are provided to implement more complexity algorithm versus CPUs. Evolution of GPUs are expected to get higher performance of scalable algorithms.
In recent years, digital circuits have become more popular compared to analog circuits since it is possible to implement complex systems with lower cost and effort. Field upgradability, scalability, energy efficiency are the most attractive advantages for signal processing used in digital solutions. However, complex digital systems require more advanced signal processing capability. Beamforming in telecommunication systems had been traditionally implemented using analog systems. Nowadays, beamforming is usually implemented on a variety of platforms including specialized DSP’s, commodity CPU’s, and FPGA’s. Beamforming implementation running on GPU processing is a new trend on this area since the parallel processing capability of GPUs allows implementing advanced beamforming algorithms. A software-defined antenna (SDA) system allows digitally constructed multiple beam responses with different orientation. Construction of these beams is realized on software and provides the flexibility of implementing alternative algorithms, field upgrades, and reduced dependency on hardware. However, digital beamforming comes with the price of high degree of computational complexity. The main contribution of this thesis is to implement an SDA system by leveraging GPUs. In order to demonstrate the performance improvement provided by a parallel algorithm optimized for GPUs, we also implemented multiple beam structure using a serial algorithm for high clock frequency CPUs. SDA is commonly known as smart antenna or adaptive antenna array, because SDA allows implementation of smart algorithms. Smart antenna technology or adaptive antenna array aim towards wireless communication, mobile communication, satellite communication, Electronic Warfare applications with emphasis on RADAR and new generation of communication systems. Smart antenna provides higher dynamic range, multi-path variation, lower interference, increasing capacity and data rate, and smart-jamming. In mobile communication systems, smart antennas have been suggested to solve interference problem between users and increase coverage capacity. Smart antenna have also been suggested for new generation communication systems which require space-time multiplexing access (SDMA), because antenna array have capability to divide space. Another field of smart antenna on satellite communication is directed-beam provide gain for signals between satellite to ground station. Adaptive directed-beam can also be used to follow satellites which display asynchronous behavior to the ground. Moreover, adaptive antenna array provides electronic scanning for RADAR application in civil or military applications. Using antenna array for electronic scanning enables fast and efficient solution. Smart antenna or adaptive antenna array have been improved toward GPUs parallel processing capability. We must emphasize GPUs provide cost-effective solution about SDA. The rapid evolution of GPUs which is originally fueled by gaming enthusiasts, got attention from scientific fields with computational problems because of its high parallel architecture and as a result GPUs are extensively used in computational finance, defense and intelligence, machine learning, manufacturing, media and entertainment, safety and security fields. Today there are GPUs specifically designed for computational problems, leveraging similar architectures with gaming GPUs. In telecommunication field, some algorithms of digital signal processing definitely require parallel structure is derived to GPUs. An example is receiver antenna model of Global Positioning System (GPS) which require to direct 4 satellite at the same time. Its performance depends on directed-beam resolution and reduction of interferences. Researchers from Stanford University implemented and simulated the antenna model shows efficient performance on GPUs[5]. This is good scenario to sign GPUs on SDA's fields. Another example about new generation base station which require dividing space to increase capacity and data rate. Some researches show that SDA system on GPU allows reduced radiation power of base stations. There is a misconception around GPUs as being cheaper alternatives for FPGAs because of their highly parallelized architecture. However, FPGAs are more suited for hard real-time problems with no tolerance to latency or jitter in processing times. They also provide very low level control on hardware behavior which enables design of many core soft processors, GPIO capabilities and standalone operation. GPUs, on the other hand, work as accelerators for CPUs, providing far greater efficiency comparing to CPUs at parallelizable tasks. FPGAs too, can be accelerators for CPUs, however, being both an accelerator and a gaming device, GPUs provide easier programming paradigms and connection interfaces. They also provide cost effectiveness, because of the economies-of-scale. Graphic cards have high computational capacity to compute a scenario. To use this capacity code blocks called “kernel” are initiated by host CPUs. Kernels load data by high speed memory transfers from host to kernel and fetches the results from kernel to host. This cooperation of GPU and CPU is called hybrid programming in the literature. As is seen, time spent on a computation on GPU consists of the duration of data transfer from CPU to GPU (tu), time spent on execution of kernel (te), and duration of data transfer for results from GPU to CPU (td). These three times are the main parameters observed for measuring the performance of a specific algorithm on GPU. In the case of IO bound algorithms with large amount of data to copy from host CPU to GPU, tu and td may be dominant. In order to utilize memory bandwidth of the GPU efficiently, there are asynchronous copy mechanisms, leveraging hardware accelerated DMA transfers while doing computations on the GPU, however in the simulations these mechanisms are not implemented and performance analysis is provided as GPU process only duration (te) and GPU total duration (tu + te + td). GPUs have different memory types, each having differing characteristics. Global memory has high capacity bandwidth and it communicates to host from PCI Express slot. Its performance affects tu and td. There are other global memories which are called constant and texture memory. These memories are read only and they have limited sources. Programmers must pay attention on memory types while developing GPU based algorithms. Another memory type is shared memory which is also called block memory. Any thread in same block can access shared memory. The last memory type is register which is private memory for threads. Registers can only be accessed by their own thread. Registers have the smallest latency among the memory types. There are two mainstream programming languages for GPUs: NVIDIA specific CUDA C (CUDA) and hardware independent OpenCL. Since CUDA lets the developers to have lower level access to GPU resources, CUDA is selected as the primary programming language, in the scope of this thesis. Proposed SDA is derived from a receiver scenario (which could easily be a transceiver one). Basic components of the SDA are a reciprocal and omni-directional antenna array, a down-converter, RF elements, a set of Analog to Digital Converters (ADC) and a software-defined beam-former module. Consisting of the thesis excepts to design and analyze all RF components. Only beam-former module is designed and realized on CPUs and GPUs. It is assumed that there is no reflection, refraction, scattering and noise in free space and SDA components. The antenna array is modeled as linear antennas; however, other antenna models can be introduced as well. Beam responses are calculated using by phase steering coefficients. Finally, synthetic base-band signals were introduced on MATLAB to observe simulation results. In the scope of this thesis, a baseband signal in receiver side and an SDA receiver system are modeled and beam responses are derived from the baseband signal model. Using these models, SDA is directed to a particular angle and a very sharp beam is created. In order to create multiple beams, multiple beam coefficient sets are created. Another parameter affecting the beam created by the SDA is the number of antenna elements. Thesis, elaborates on the effect of these parameter in the following sections. With careful selection of beam coefficients and powerful computation devices like GPUs, an SDA is capable of providing multiple sharp beams using the same set of antennas. Computational complexity of digital beamforming depends on sampling frequency, size of antenna elements and sampling resolution. SDA system has number of created beam size times more computing complexity than unique digital beamforming. It is extracted from analytical equation. To compute response of SDA have two algorithms which are designed as GPU algorithm and CPU algorithm. GPU algorithm have one cycle less than CPU algorithm and the cycle have biggest counter. It shows that GPU algorithm is the good solution of parallel processing. However, every digital circuit has limited sources so we cannot expect to eliminate this cycle duration at simulation result. In this work, two types of simulation introduced to perform CPU and GPU algorithms on their devices. One shows only processing time (te) and the other simulation shows total processing time (tu+te+td) for GPU versus CPU. According to simulation result, GPU perform much more fast respond from CPU. Its details are executed in section 4. GPU based algorithm computes one frame at a time. To improve GPU algorithm performance, frame size must be analyzed to find the optimum frame size. First, coarse frame size is selected by power of 2 (2n) and GPU algorithm is simulated. According to simulation result, increasing frame size performed better response until a point which is deep simulation time, after this point frame size does not affect the performance. Fine frame size is predicted with a second simulation. In this simulation, frame size is changed linearly with one-sample steps. Initial frame size is selected by the best coarse frame length. According to this simulation result didn't show significant changes. Thus, we show relation of frame size and processing time. Flexible frame sizes of the algorithm provides further integration possibilities and comes with trade-offs. Larger frame size provides higher performance (e.g. sharper beam, increased number of beams), higher energy efficiency (e.g. same amount of data, same number of clock cycles, means the same energy required but systems creates better beams) but it introduces latency on the system (e.g. more data required to process and more data is collected within a larger time frame), on the other hand, small frame lengths provide small delays but lower performance, thus lower energy efficiency. In section 6, for the simulation work, software is implemented to work on a Windows based PC. Software has an interface for SDA pattern visualization and required simulation data can be calculated on this interface. Data of antenna array have to fix determined name and format (see figure 6.2). The software processes the calculated data within a small time frame-short enough for real-time- for required duration. Before using this software, every hardware has to check for qualification for example brand name, GPU compute capability and other CUDA properties. As a result; SDA on GPU have significant improvement to reduce total processing time comparing to SDA on CPU. Moreover, the scalable algorithm structure would provide further performance gains with future GPUs, leading to easily upgradable and cost effective solutions. In conclusion, SDA have significant performance on GPU. GPU algorithm processing this antenna is much more faster than CPU algorithm. Moreover, GPUs are provided to implement more complexity algorithm versus CPUs. Evolution of GPUs are expected to get higher performance of scalable algorithms.
Açıklama
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2015
Thesis (M.Sc.) -- İstanbul Technical University, Instıtute of Science and Technology, 2015
Thesis (M.Sc.) -- İstanbul Technical University, Instıtute of Science and Technology, 2015
Anahtar kelimeler
Yazılım Tabanlı Anten,
Yazılım-tabanlı Anten,
Akıllı Anten,
Uyarlamalı Anten Dizileri,
Paralel İşleme,
Gpu Programlama,
Sayısal Demet Biçimlendirme,
Çoklu Anten Işıma Demeti,
Software- Defined Antenna,
Smart Antenna,
Adaptive Antenna Array,
Parallel Processing,
Gpu Programming,
Digital Beamforming,
Multi-beam Pattern