Implementation of the YOLOv8 convolutional neural network block on fpga
Implementation of the YOLOv8 convolutional neural network block on fpga
dc.contributor.advisor | Akgül, Tankut | |
dc.contributor.author | İlhan, Celilşamil | |
dc.contributor.authorID | 504211237 | |
dc.contributor.department | Electronics Engineering | |
dc.date.accessioned | 2025-07-07T08:07:13Z | |
dc.date.available | 2025-07-07T08:07:13Z | |
dc.date.issued | 2025-01-07 | |
dc.description | Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2025 | |
dc.description.abstract | Convolutional Neural Networks (CNNs) are among the most extensively studied topics, with applications in areas such as computer vision, object detection, and image processing. CNN operations are divided into two main phases: training and inference. Training requires intensive computations, including forward and backward propagation. Forward propagation estimates output using current weights, while backward propagation calculates the error between predicted and actual outputs, updating weights iteratively using gradient descent. Hardware acceleration studies focus on improving either training or inference, given the computational intensity of CNNs, which often require millions of operations for a single output. Hardware platforms for CNNs include CPUs, GPUs, FPGAs, and ASICs. While ASICs offer high speed, their cost and single-purpose nature limit their use. CPUs lack flexibility for custom operations, and GPUs are best suited for tasks with high-density algorithms. However, GPUs show performance limitations for workloads with low-density algorithms. FPGAs fill this gap by offering reconfigurable architecture, parallelism, and pipelining, making them versatile for both simple and complex CNN layers. In this study, FPGA-based CNN implementations achieved a maximum frequency of 275 MHz, with most IP blocks operating at 250 MHz. Hard-wired structures were utilized for data storage, outperforming LUT-based approaches in terms of efficiency and temporal performance. BRAM-based FIFO structures with depths ranging from 16 to 512 entries provided substantial improvements, although benefits plateaued beyond 512 entries. Cascading also minimized delays caused by repeated memory fetches. In this study, parallelization is one of the main factors of performance increase. For example, with 8 parallel operations, the best execution time was 0.14 seconds, which could drop to around 0.07 seconds with 16 parallel operations. This highlights the importance of extensive parallelization, particularly in deeper CNN layers with more channels. Optimal performance requires segmented designs with minimal LUT and Flip-Flop utilization to maximize frequency and avoid internal resource bottlenecks. In conclusion, the study emphasizes parallelization, correct use of hard-wired structures, and cascading as critical strategies to enhance CNN performance on FPGAs while ensuring resource efficiency and scalability. | |
dc.description.degree | M.Sc. | |
dc.identifier.uri | http://hdl.handle.net/11527/27492 | |
dc.language.iso | en_US | |
dc.publisher | Graduate School | |
dc.sdg.type | Goal 9: Industry, Innovation and Infrastructure | |
dc.subject | convolutional neural network | |
dc.subject | evrişimsel sinir ağı | |
dc.subject | FPGA | |
dc.title | Implementation of the YOLOv8 convolutional neural network block on fpga | |
dc.title.alternative | YOLOv8 evrişimsel sinir ağı bloğunun fpga üzerinde gerçeklenmesi | |
dc.type | Master Thesis |