Implementation of the YOLOv8 convolutional neural network block on fpga

İlhan, Celilşamil

Implementation of the YOLOv8 convolutional neural network block on fpga

dc.contributor.advisor	Akgül, Tankut
dc.contributor.author	İlhan, Celilşamil
dc.contributor.authorID	504211237
dc.contributor.department	Electronics Engineering
dc.date.accessioned	2025-07-07T08:07:13Z
dc.date.available	2025-07-07T08:07:13Z
dc.date.issued	2025-01-07
dc.description	Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2025
dc.description.abstract	Convolutional Neural Networks (CNNs) are among the most extensively studied topics, with applications in areas such as computer vision, object detection, and image processing. CNN operations are divided into two main phases: training and inference. Training requires intensive computations, including forward and backward propagation. Forward propagation estimates output using current weights, while backward propagation calculates the error between predicted and actual outputs, updating weights iteratively using gradient descent. Hardware acceleration studies focus on improving either training or inference, given the computational intensity of CNNs, which often require millions of operations for a single output. Hardware platforms for CNNs include CPUs, GPUs, FPGAs, and ASICs. While ASICs offer high speed, their cost and single-purpose nature limit their use. CPUs lack flexibility for custom operations, and GPUs are best suited for tasks with high-density algorithms. However, GPUs show performance limitations for workloads with low-density algorithms. FPGAs fill this gap by offering reconfigurable architecture, parallelism, and pipelining, making them versatile for both simple and complex CNN layers. In this study, FPGA-based CNN implementations achieved a maximum frequency of 275 MHz, with most IP blocks operating at 250 MHz. Hard-wired structures were utilized for data storage, outperforming LUT-based approaches in terms of efficiency and temporal performance. BRAM-based FIFO structures with depths ranging from 16 to 512 entries provided substantial improvements, although benefits plateaued beyond 512 entries. Cascading also minimized delays caused by repeated memory fetches. In this study, parallelization is one of the main factors of performance increase. For example, with 8 parallel operations, the best execution time was 0.14 seconds, which could drop to around 0.07 seconds with 16 parallel operations. This highlights the importance of extensive parallelization, particularly in deeper CNN layers with more channels. Optimal performance requires segmented designs with minimal LUT and Flip-Flop utilization to maximize frequency and avoid internal resource bottlenecks. In conclusion, the study emphasizes parallelization, correct use of hard-wired structures, and cascading as critical strategies to enhance CNN performance on FPGAs while ensuring resource efficiency and scalability.
dc.description.degree	M.Sc.
dc.identifier.uri	http://hdl.handle.net/11527/27492
dc.language.iso	en_US
dc.publisher	Graduate School
dc.sdg.type	Goal 9: Industry, Innovation and Infrastructure
dc.subject	convolutional neural network
dc.subject	evrişimsel sinir ağı
dc.subject	FPGA
dc.title	Implementation of the YOLOv8 convolutional neural network block on fpga
dc.title.alternative	YOLOv8 evrişimsel sinir ağı bloğunun fpga üzerinde gerçeklenmesi
dc.type	Master Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 504211237.pdf
Boyut:: 3.38 MB
Format:: Adobe Portable Document Format
Açıklama

İndir

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: license.txt
Boyut:: 1.58 KB
Format:: Item-specific license agreed upon to submission
Açıklama

İndir

Koleksiyonlar

LEE- Elektronik Mühendisliği-Yüksek Lisans