LEE- Elektronik Mühendisliği-Yüksek Lisans
Bu koleksiyon için kalıcı URI
Gözat
Son Başvurular
1 - 5 / 48
-
ÖgeModel-based design and implementation of schedulers in ARINC-664 end system as a system on chip(Graduate School, 2022)Ethernet-based deterministic network protocol that provides bounded delay and jitter using redundant communication among the avionics applications. Achieving the end-to-end bounded delay objectives requires that incoming Ethernet frames must be regulated according to the ARINC-664 standard. In ARINC-664, each rate-constrained flow, i.e., Virtual Link (VL), is regulated by using End Systems (ESs) and Bandwidth Allocation Gap (BAG). Each regulated VL must be served at a time, so a scheduling mechanism must be used when more than one queue is ready to be served. ARINC-664 standard does not specify the details of the scheduling algorithm. However, some algorithms are proposed in the literature for ARINC-664 scheduling. Field Programmable Gate Array (FPGA) is one of the most preferred implementation choices for ARINC-664 due to its low power consumption, low latency data transfer, and security advantages. Traditional FPGA development requires building design and verification with Hardware Description Languages (HDLs). Instead of this time-consuming FPGA development, using a model-based hardware design enables faster prototyping and testing environment. In this thesis, first, a Single Queue model is designed and developed in Simulink to provide a basic queueing infrastructure for ARINC-664 ES. Then, the ARINC-664 ES model is developed on top of the Single Queue model. The scheduling algorithms in ARINC-664 ES are designed and developed using HDL convertible components. The Smallest BAG (SB), the Smallest Size (SS), the Longest Queue (LQ), and the First-In-First-Out (FIFO) ARINC-664 ES scheduling algorithms are implemented. This implementation allows collecting the mean, standard deviation, and maximum of jitter performances of the scheduling algorithms. In addition, an ARINC-664 ES Dynamic Scheduler model whose components can be converted to HDLs and C/C++ is built. This model contains all the scheduling algorithms, and the user can switch among the scheduling algorithms while the model is operating.
-
ÖgeModel-based aı accelerator design on FPGA with in-depth evaluation of design parameters(Graduate School, 2025-02-11)Artificial intelligence (AI) has advanced considerably in recent years. However, as models become increasingly complex, traditional hardware, such as GPUs and CPUs, faces limitations in meeting demands for power efficiency, low latency, and energy optimization. FPGAs, with their parallel processing capabilities, low latency, reconfigurable architecture, and reduced power consumption, have gained traction as an alternative platform for deploying AI models. Consequently, research on AI model deployment on FPGAs has seen significant growth in recent years. To facilitate AI deployment in FPGAs, frameworks such as Vitis AI have been introduced. This study used the Vitis AI framework to implement three different AI accelerators in the Kria KV260 Vision AI Starter Kit, a system-on-chip (SoC) platform. The first design involved vehicle color recognition using the ResNet-18 CNN model in PyTorch, fine-tuned with the Vehicle Color Recognition (VCoR) dataset. The model was quantized using the Vitis AI quantizer, compiled and optimized for deployment using the Vitis AI compiler, and then integrated onto the FPGA. The Vitis TRD flow, rather than the Vivado TRD, was followed to create the hardware design, simplifying the process and eliminating the need for PetaLinux, which is often more complex to use. Deployment on FPGA was performed using the PYNQ framework, enabling Python-based model integration without requiring any hardware description languages. The dataset was pre-processed before inference, and real-time performance was achieved by integrating the FPGA with a camera. In the second accelerator, the ResNet-18 model was fine-tuned for pneumonia diagnosis using chest X-ray images. This design reused the existing hardware, demonstrating the flexibility and adaptability of the model-based design approach for different classification tasks without additional hardware modifications. The third design implemented object detection using the YOLOv3 CNN model, pre-trained on the COCO dataset and obtained in a pre-quantized form from the Vitis Model Zoo. The model was compiled to fit the existing hardware configuration and deployed on the FPGA. Both pre-processing and post-processing steps were integrated using PYNQ, allowing bounding box generation and visualization of detected objects. Real-time object detection was achieved by connecting a live camera feed to the system. A key contribution of this work is the adoption of a model-based design approach, which simplifies FPGA deployment by avoiding the need for PetaLinux or low-level hardware design. By following the Vitis TRD flow, a more accessible and user-friendly alternative to Vivado TRD, this study developed a detailed guide for deploying AI models on FPGAs. Furthermore, the study conducted a comprehensive analysis of DPU configuration parameters and frequency settings, filling a significant gap in the literature. The results provided new insights into performance, resource utilization, power consumption, and energy efficiency, clarifying gaps, and addressing potentially misleading conclusions in previous studies within the academic literature. This work contributes to the field by presenting practical, user-friendly methods for designing and analyzing AI accelerators for both classification and object detection tasks. The study highlights the advantages of Vitis AI and model-based FPGA design, providing a guide for future research and development in the high-performance and energy-efficient AI deployment on FPGA.
-
ÖgeSystem-on-chip design with open-source FPGA IP(Graduate School, 2025-03-07)In recent years, the demand for computing power has increased due to the increasing number of Internet of Things (IoT) devices and artificial intelligence applications. Field programmable gate arrays (FPGA) are frequently used to meet this demand because of their simultaneous computation, reconfigurability, and high bandwidth. However, integrating FPGAs into the system is challenging. Since they use multiple voltage levels and consume lots of power, producing a suitable printed circuit board (PCB) takes time to develop and increases design costs. At the same time, FPGA packages are large, and the price per piece is higher compared to many integrated circuits, so the procurement costs of products containing FPGAs are also high. Embedded FPGAs (eFPGA) aim to solve these problems by integrating FPGA fabrics into the system-on-chips (SoC). EFPGA vendors can produce FPGA fabrics with fewer look-up tables (LUT), i.e. fabrics with less space and power consumption, to satisfy customer requirements. There are two different design methods for embedded FPGAs: hard and soft. Hard eFPGAs are designed at the transistor level, similar to discrete FPGAs, and are specific to a semiconductor manufacturing technology. Soft eFPGAs are generated as RTL code, and since they are independent of the manufacturing technology, the architectures can be easily fine-tuned and manufactured using different technologies. In this study, a system-on-chip system with soft eFPGA intellectual property (IP) is designed. There are similar studies on this topic, but the aim is to show that it is possible to design an SoC with open-source tools and designs. In addition to the embedded FPGA IP; the processor, the memory that stores the program data for the processor and bitstream files for the FPGA, and two UART elements, one for the processor to use and one for loading the data of the memory element. The Advanced eXtensible Interface 4 (AXI4) protocol of Advanced Microcontroller Bus Architecture (AMBA) standard provides the on-chip communication. The system is mostly prepared with open-source design tools or taken from open-source projects. The processor is CVA6, previously developed in the PULP Platform group of ETH Zürich and now maintained by Open HW Group. It is a 64-bit processor, has open-source RISC-V architecture and supports I, M, C and A extensions. It is a parametric core, the optimum performance can be obtained by changing the parameters which define the core. The embedded FPGA IP is generated with an open-source FPGA fabric generator called OpenFPGA. OpenFPGA can produce RTL codes for FPGA in Verilog format, verification environment, Synopsys timing constraint commands and bitstream files suitable for the desired architecture with Yosys open source RTL synthesizer and Versatile Place-and-Route (VPR), FPGA placement and routing program. The FPGA prepared in this study includes 1960 six-input LUTs, 1960 flip-flops, 50 input-output cells and a register interface. The logic blocks in the FPGA consist of ten LUTs, ten flip-flops and local routing multiplexers. This method gives the lowest value in terms of area-delay multiplication. Local routing multiplexers are reduced to 50\% and reduce the delays in the logic block. Thus, the critical paths and the area covered by multiplexers are reduced without damaging the logic block functionality. The switching block is the Wilton style, which is the best in terms of routability and area, and its flexibility coefficient is 3. Multiplexers are selected for the switches in the FPGA because they cover a smaller area than tri-state buffers and can be optimized in digital implementation tools. Due to the small size of the routing architecture, L4 segments were used; longer ones were not preferred. Input-output (IO) blocks are used as standard input-output blocks of the preferred production technology, and vertical and horizontal IO blocks are made of separate cells to be compatible with the grid in the technology. The register interface provides the AXI interface to which the processor and other blocks are connected to communicate with the design inside the FPGA. The interface consists of two control and status registers and six 64-bit data registers so that the status of the eFPGA can be learnt and a total of 384-bit data can be transferred to the FPGA simultaneously. For programming the embedded FPGA IP, the memory bank was selected from five programming protocols in OpenFPGA. It takes up less space than other protocols because it allows the latch structure for programmable memory. Inside the system, up to three bitstream files can be stored and read from them to reprogram during runtime. A configuration circuit is designed to program the IP according to the preferred protocol in this study. Each configurable element is controlled through bit line (BL) and word line (WL) signals.
-
ÖgeImplementation of the YOLOv8 convolutional neural network block on fpga(Graduate School, 2025-01-07)Convolutional Neural Networks (CNNs) are among the most extensively studied topics, with applications in areas such as computer vision, object detection, and image processing. CNN operations are divided into two main phases: training and inference. Training requires intensive computations, including forward and backward propagation. Forward propagation estimates output using current weights, while backward propagation calculates the error between predicted and actual outputs, updating weights iteratively using gradient descent. Hardware acceleration studies focus on improving either training or inference, given the computational intensity of CNNs, which often require millions of operations for a single output. Hardware platforms for CNNs include CPUs, GPUs, FPGAs, and ASICs. While ASICs offer high speed, their cost and single-purpose nature limit their use. CPUs lack flexibility for custom operations, and GPUs are best suited for tasks with high-density algorithms. However, GPUs show performance limitations for workloads with low-density algorithms. FPGAs fill this gap by offering reconfigurable architecture, parallelism, and pipelining, making them versatile for both simple and complex CNN layers. In this study, FPGA-based CNN implementations achieved a maximum frequency of 275 MHz, with most IP blocks operating at 250 MHz. Hard-wired structures were utilized for data storage, outperforming LUT-based approaches in terms of efficiency and temporal performance. BRAM-based FIFO structures with depths ranging from 16 to 512 entries provided substantial improvements, although benefits plateaued beyond 512 entries. Cascading also minimized delays caused by repeated memory fetches. In this study, parallelization is one of the main factors of performance increase. For example, with 8 parallel operations, the best execution time was 0.14 seconds, which could drop to around 0.07 seconds with 16 parallel operations. This highlights the importance of extensive parallelization, particularly in deeper CNN layers with more channels. Optimal performance requires segmented designs with minimal LUT and Flip-Flop utilization to maximize frequency and avoid internal resource bottlenecks. In conclusion, the study emphasizes parallelization, correct use of hard-wired structures, and cascading as critical strategies to enhance CNN performance on FPGAs while ensuring resource efficiency and scalability.
-
ÖgeDesign of a highly efficient and linear driver power amplifier for 5G applications(Graduate School, 2025-01-28)Wireless communication systems enable data transmission from one point to another through non-physical connections, primarily using radio frequency signals. Transmitters use power amplifiers to ensure the transmitted signal reaches the receiver with adequate strength. Power amplifiers are the components with the highest power consumption in a transmitter system, making their performance critical. Key performance parameters include linearity, gain, efficiency, output power, and bandwidth. Power amplifiers require sufficient input power for efficient operation, often necessitating a multi-stage configuration, with a driver stage followed by an output stage. Modern wireless systems like 5G employ complex modulation techniques involving amplitude and phase variations, resulting in signals with high PAPR ranging from 6 to 15 dB. Conventional PA designs are optimized for peak power efficiency, but efficiency decreases significantly at back-off levels due to high PAPR signals. Addressing this issue necessitates innovative techniques, often focused on the output stage. However, the entire system's efficiency is also influenced by the driver stage, which traditionally uses high-linearity designs with moderate efficiency. Enhancing driver stage efficiency while maintaining adequate linearity can improve overall system performance. This thesis proposes a novel approach to designing a highly efficient and linear driver power amplifier by operating in the nonlinear deep Class AB region. While traditional driver power amplifiers prioritize linearity, this design integrates supply modulation to optimize the drain bias voltage, improving back-off efficiency without significantly compromising linearity. The driver power amplifier was designed and simulated in Advanced Design System software, employing a CGH40006P GaN HEMT transistor on a RO4350B substrate. The PA operates at 3.5 GHz with a -2.85 V gate and 20 V drain bias, achieving deep Class AB operation. The design includes sub-circuit blocks for biasing and stability, optimized for linearity and efficiency. Load/source pull simulations determined the impedance for maximum power-added efficiency used in IMN and OMN. Simulations indicate that the driver power amplifier achieves a PAE of 63.36%, an output power of 35.4 dBm, and a small signal gain of 16.2 dB at 3.5 GHz. Moreover, the output phase change is only 16° up to the 3 dB compression point. For a modulated 5G NR signal with an 8.5 dB PAPR and 40 MHz bandwidth, the driver power amplifier delivers an ACPR of -31.62 dBc and an EVM of 5.3%. Supply modulation across 10–20 V improves back-off efficiency by approximately 15% and 20% at 12 dB and 9 dB back-off levels. These results demonstrate the feasibility of integrating nonlinear, high-efficiency driver power amplifiers into modern communication systems.