Hücresel Yapay Sinir Ağı İşlemcisi Tasarımı Ve FPGA Gerçeklemesi
Hücresel Yapay Sinir Ağı İşlemcisi Tasarımı Ve FPGA Gerçeklemesi
thumbnail.default.placeholder
Tarih
2016-07-13
Yazarlar
Meriç, Volkan
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Fen Bilimleri Enstitüsü
Institute of Science and Technology
Institute of Science and Technology
Özet
Gelişen CMOS teknolojisi ile kameralar giderek daha fazla günlük hayatımızın içinde yer almaya başladılar. CMOS kameralar ticari olarak hazır, kolay temin edilebilir, kullanımı kolay, ucuz ve kaliteli görüntü üretebilir hale geldiler. Bu sensörlerden daha iyi yararlanma isteği görüntü işlemenin önemini giderek arttırıyor. Giriş görüntüsünü istenilen hale getirmeye veya görüntünün içerisinden istenilen bilgileri çıkarma işlemine görüntü işleme denir. Kontrol sisteminin hızlı cevap verme gereksinimi veya işlenmiş görüntünün asıl işle beraber olma isteği gerçek zamanlı görüntü işleme ihtiyacı yaratmaktadır. Canlılar sinir ağı sayesinde gelen verileri paralel olarak işleyebilmektedir. Sinir hücreleri düşük frekansta çalışmasına rağmen paralel işleme yeteneği sayesinde yüksek işlem kabiliyetine sahiptir ve çok hızlı karar verirler. Canlıların sinir ağından esinlenilen Hücresel Yapay Sinir Ağları (HYSA) görüntü işleme fonksiyonu için oldukça uygundur ve gerçekleştirdiği görevin basit parametreler ile değiştirilebilmesi HYSA'yı bu alanda oldukça etkili kılmaktadır. HYSA yoğun işlem gücü gerektirmesi, donanımın paralel işlem kabiliyeti nedeni ile donanım çözümlerini yazılıma göre daha uygun kılmaktadır. HYSA'nın ilk olarak analog gerçeklemeleri yapılmıştır ama bunların eksiklikleri araştırmacıları sayısal donanım çözümlerine yöneltmiştir. Prototiplemeye imkan vermesi, tasarım kolaylığı ve tekrar programlanabilirliği ile FPGA'ler günümüzde sıklıkla sayısal donanım tasarımı için kullanılmaktadır. Bu çalışmada, ayrık zamanlı HYSA emülatörü olarak yeni bir Hücresel Yapay Sinir Ağı (HYSA) işlemcisi mimarisi sunulmuştur. Bu mimariye Hücresel Yapay Sinir Ağı Emülatör İşlemcisi (HYSA-Eİ) ismi verilmiştir. Uygulamaya yönelik bir işlemci ve bunun özel kodlanması hem hız ihtiyacını karşılama hem de esneklik sağlaması açısından oldukça kullanışlı bir çözüm sunmaktadır. HYSA bir işlemcinin aritmetik fonksiyonu olarak tasarlanmıştır. Bu fonksiyonu çağırabilen basit bir işlemci tasarlanmış ve gerçeklenmiştir. Komut n kere 3x3'lük şablonlu HYSA fonksiyonunu girişe uygular. Komutun çıkışı farklı hafıza bölümlerinde bir sonraki komutta kullanılmak üzere kayıt edilmektedir. Giriş ve başlangıç koşulu kayıtlı DDR2 bölümlerinden seçilebilir. Bu mimari işlemci çekirdeğini çevre ünitesi kontrolörleri ile birleştirmiştir. Ayrıca, HYSA-Eİ Spartan 6 XC6SLX45 FPGA üzerinde gerçeklenmiştir. İki adet örnek program ile test edilmiş, örnek program ve test çıktıları verilmiştir. Sistemin düşük lojik kaynak tüketmektedir. Belirli bir iterasyon sayısında HYSA çekirdeği 1600x900 videoyu 15 fps'ye kadar işleyebilmektedir. Performansı iterasyon sayısının artması ile azalmaktadır. Sistem 16,6 iterasyon/s işlem hızında çalışmaktadır.
We see cameras more more in our daily life thanks to the developments in CMOS technology. CMOS cameras are become ready to use, easy to buy and cheap with good picture quality. Desire to benefit more from these sensors is increasing importance of image processing. Image processing phenomena used for changing image into a desired form or extracting information from it. Rapid response and online processing demands increase the real time image processing need. In order to match the computational power of real time image processing, very fast or parallel data processing units are required. Living beings process data parallel with their Neural Networks. Even though neurons working at low frequencies they can process large amount of data with short period of a time thanks their parallel data processing capabilities. Since it was invented in 1993, Cellular Neural Network (CNN) has become very approved in visual processing. The connections in the eye and neighbourhood approach in CNN builds the biological background for the use of CNN. Bioinspired data processing method Cellular Neural Networks are suitable for image processing application and their function can change with few parameters. This make CNN more powerful at image processing domain. Apart from their use in sensory data processing or information processing element, it can also use in solving partial differential equations or information generation. In the content of study, first of all cellular neural network has been researched. CNN's are two dimensional, space invariant non-linear processing arrays. The architecture of CNN consists of rectangular cell arrays with mxn dimensions. The minimal unit of the CNN is called "cell". These cells consist of dynamic part, sum block and activation function. The main advantage of CNN is that; many image processing algorithms can be implemented on the same architecture. Function of CNN determined by template which consist of two matrices and one constant. The templates are essential element for CNN to process the information correctly. A large collection of templates is available for 2D linear and non-linear signal processing operations. CNN has been implemented as hardware (analog or digital) or software (computer or DSP). Parallel computation capability of hardware gives superior computation power against software. CNN's first implemented as analog arrays. These implementations are very fast and has low power consumption. However, there are certain drawbacks of analog design such as low resolution, limited cell number, high sensitivity to heat and noise, complex and expensive design. These drawbacks redirect researcher to digital design. Field Programmable Gate Arrays are programmable silicon chips which functionality determined by hardware design language (HDL). FPGA's are been used successfully at digital hardware design, because of prototyping capabilities, easy to design and reprogrammability. In this work, we propose a new Cellular Neural Network (CNN) processing core architecture for digital emulation of discrete time CNN. This architecture is called Cellular Neural Network Emulator Processor (CNN-EP). Application specific processor and its unique coding gives practical solution for speed requirements and flexibility. CNN is design and implemented as a function of arithmetic logic unit. The proposed system is capable of executing a CNN instruction. Also a simple processor which can run CNN function is designed and implemented. System stores captured input image from camera in external memory to be processed. After processing inputs, it writes results to external memory. Also evolution of the state variables in each iteration is monitored with DVI controller in the system. The instruction operates n times 3x3 template operation on the input. Designed architecture is capable of using different initial conditions, input and iteration count. Result of the instructions can be stored at different memory segments to be used by next instructions. Input and initial images can be selected from the stored DDR2 segments. In this way set of templates which can be applied in a particular order might process an information that is cannot processed with a single template operation. Data transfer mechanism between DDR memory and CNN Core is design to have maximum throughput. CNN Emulator Processors internal controller allowed multiple templates to be performed on the same or different images stored on the external memory. Hence CNN-EP execute OUT=CNN(U,X0,Temp,n) instruction where U input image, X0 initial image, OUT result image and Temp template addresses, n is iteration count. Several templates are stored at ROM in FPGA hence CNN Emulator Processor performs several instructions. CNN-EP has also an instruction memory to perform sequential CNN instruction. CNN-EP architecture consist of CNN processor unit, CNN control unit and instruction decoder. FPGA also contains two ROM's inside, for store instruction and store different templates. CNN processor unit calculates CNN outputs. Instruction decoders decodes instructions arbitrarily and executes them. CNN controller control CNN arithmetic unit and responsible data synchronization between block RAMs, ROM and DDR2 memory. The architecture has been combined the core unit with peripheral control units. These peripheral control units are DDR2 memory controller, camera controller and DVI monitor controller. Camera controller transfers images camera to DDR2 memory by instruction decoder command. DDR2 memory controller uses specified ports to transfers data between FPGA and DDR2 memory by commands from CNN controller. DVI controller takes output image and drive LCD monitor with HD+ (1600x900) resolution @60 Hz refresh rate. Furthermore, FPGA implementation also is given. CNN-EP implemented at Atlyss board which contains Spartan 6 XC6SLX45 FPGA and 128 Mbyte DDR2 SDRAM. VmodCAM has been used as camera. First of all, CNN processing core is implemented and tested with ISIM simulator. Test result which compares FPGA output with MATLAB output is given. Then this core imported into real time system. System has been tested with two different sample program and test results is given. System is designed to consume minimal logic resource. CNN Processor can process upto 1600x900 video @15 fps with certain iteration count. Debug and timing analysis of system done by chipscope software and platform cable from Xilinx. One CNN iteration including reading and writing image took 5 clock cycle. 108 MHz clock frequency used for processing unit. System performance decreases with iteration count. System exactly work at 16,6 iteration/s speed. Comparison with other CNN emulator architecture is given and its contribution explained. When the existing CNN emulators are considered, RTCNNP and CASTLE architectures doesn't use external memory so they lacking capability of taking different initial conditions and input on run time. Also iteration count cannot change without reconfiguration and limited by FPGA resources. Implemented system proven to be efficient at manner of system resource and data transfers rates. The experimental results, which generated by different instructions have shown the capabilities of the system. Finally, hardware performance of system has been given.
We see cameras more more in our daily life thanks to the developments in CMOS technology. CMOS cameras are become ready to use, easy to buy and cheap with good picture quality. Desire to benefit more from these sensors is increasing importance of image processing. Image processing phenomena used for changing image into a desired form or extracting information from it. Rapid response and online processing demands increase the real time image processing need. In order to match the computational power of real time image processing, very fast or parallel data processing units are required. Living beings process data parallel with their Neural Networks. Even though neurons working at low frequencies they can process large amount of data with short period of a time thanks their parallel data processing capabilities. Since it was invented in 1993, Cellular Neural Network (CNN) has become very approved in visual processing. The connections in the eye and neighbourhood approach in CNN builds the biological background for the use of CNN. Bioinspired data processing method Cellular Neural Networks are suitable for image processing application and their function can change with few parameters. This make CNN more powerful at image processing domain. Apart from their use in sensory data processing or information processing element, it can also use in solving partial differential equations or information generation. In the content of study, first of all cellular neural network has been researched. CNN's are two dimensional, space invariant non-linear processing arrays. The architecture of CNN consists of rectangular cell arrays with mxn dimensions. The minimal unit of the CNN is called "cell". These cells consist of dynamic part, sum block and activation function. The main advantage of CNN is that; many image processing algorithms can be implemented on the same architecture. Function of CNN determined by template which consist of two matrices and one constant. The templates are essential element for CNN to process the information correctly. A large collection of templates is available for 2D linear and non-linear signal processing operations. CNN has been implemented as hardware (analog or digital) or software (computer or DSP). Parallel computation capability of hardware gives superior computation power against software. CNN's first implemented as analog arrays. These implementations are very fast and has low power consumption. However, there are certain drawbacks of analog design such as low resolution, limited cell number, high sensitivity to heat and noise, complex and expensive design. These drawbacks redirect researcher to digital design. Field Programmable Gate Arrays are programmable silicon chips which functionality determined by hardware design language (HDL). FPGA's are been used successfully at digital hardware design, because of prototyping capabilities, easy to design and reprogrammability. In this work, we propose a new Cellular Neural Network (CNN) processing core architecture for digital emulation of discrete time CNN. This architecture is called Cellular Neural Network Emulator Processor (CNN-EP). Application specific processor and its unique coding gives practical solution for speed requirements and flexibility. CNN is design and implemented as a function of arithmetic logic unit. The proposed system is capable of executing a CNN instruction. Also a simple processor which can run CNN function is designed and implemented. System stores captured input image from camera in external memory to be processed. After processing inputs, it writes results to external memory. Also evolution of the state variables in each iteration is monitored with DVI controller in the system. The instruction operates n times 3x3 template operation on the input. Designed architecture is capable of using different initial conditions, input and iteration count. Result of the instructions can be stored at different memory segments to be used by next instructions. Input and initial images can be selected from the stored DDR2 segments. In this way set of templates which can be applied in a particular order might process an information that is cannot processed with a single template operation. Data transfer mechanism between DDR memory and CNN Core is design to have maximum throughput. CNN Emulator Processors internal controller allowed multiple templates to be performed on the same or different images stored on the external memory. Hence CNN-EP execute OUT=CNN(U,X0,Temp,n) instruction where U input image, X0 initial image, OUT result image and Temp template addresses, n is iteration count. Several templates are stored at ROM in FPGA hence CNN Emulator Processor performs several instructions. CNN-EP has also an instruction memory to perform sequential CNN instruction. CNN-EP architecture consist of CNN processor unit, CNN control unit and instruction decoder. FPGA also contains two ROM's inside, for store instruction and store different templates. CNN processor unit calculates CNN outputs. Instruction decoders decodes instructions arbitrarily and executes them. CNN controller control CNN arithmetic unit and responsible data synchronization between block RAMs, ROM and DDR2 memory. The architecture has been combined the core unit with peripheral control units. These peripheral control units are DDR2 memory controller, camera controller and DVI monitor controller. Camera controller transfers images camera to DDR2 memory by instruction decoder command. DDR2 memory controller uses specified ports to transfers data between FPGA and DDR2 memory by commands from CNN controller. DVI controller takes output image and drive LCD monitor with HD+ (1600x900) resolution @60 Hz refresh rate. Furthermore, FPGA implementation also is given. CNN-EP implemented at Atlyss board which contains Spartan 6 XC6SLX45 FPGA and 128 Mbyte DDR2 SDRAM. VmodCAM has been used as camera. First of all, CNN processing core is implemented and tested with ISIM simulator. Test result which compares FPGA output with MATLAB output is given. Then this core imported into real time system. System has been tested with two different sample program and test results is given. System is designed to consume minimal logic resource. CNN Processor can process upto 1600x900 video @15 fps with certain iteration count. Debug and timing analysis of system done by chipscope software and platform cable from Xilinx. One CNN iteration including reading and writing image took 5 clock cycle. 108 MHz clock frequency used for processing unit. System performance decreases with iteration count. System exactly work at 16,6 iteration/s speed. Comparison with other CNN emulator architecture is given and its contribution explained. When the existing CNN emulators are considered, RTCNNP and CASTLE architectures doesn't use external memory so they lacking capability of taking different initial conditions and input on run time. Also iteration count cannot change without reconfiguration and limited by FPGA resources. Implemented system proven to be efficient at manner of system resource and data transfers rates. The experimental results, which generated by different instructions have shown the capabilities of the system. Finally, hardware performance of system has been given.
Açıklama
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2016
Thesis (M.Sc.) -- İstanbul Technical University, Instıtute of Science and Technology, 2016
Thesis (M.Sc.) -- İstanbul Technical University, Instıtute of Science and Technology, 2016
Anahtar kelimeler
Hücresel Yapay Sinir Ağları,
Sayısal Görüntü İşleme,
Cellular Artificial Neural Networks,
Digital Image Processing