Privacy and security enhancements of federated learning
Privacy and security enhancements of federated learning
Dosyalar
Tarih
2024-07-12
Yazarlar
Erdal, Şükrü
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
Federated Learning has emerged as a revolutionary approach in the field of machine learning, addressing significant concerns related to data privacy and security. Traditional centralized machine learning models require data aggregation on central servers, posing substantial risks of data breaches and privacy violations. FL, on the other hand, distributes the model training process across multiple decentralized edge devices, keeping the raw data localized and mitigating the privacy risks associated with centralized data storage and processing. The motivation for this thesis stems from the growing need to enhance privacy and security in FL applications. As data privacy regulations become more stringent and public awareness of data security increases, there is a pressing demand for robust FL frameworks that can protect sensitive information while maintaining high model performance. FL's ability to leverage the computational power of edge devices, such as smartphones and IoT gadgets, makes it a promising solution for various domains including healthcare, finance, and the Internet of Things. The primary objectives of this thesis are threefold: 1. To provide a comprehensive survey of existing research on privacy-enhanced FL, synthesizing key concepts, methodologies, and findings. 2. To identify gaps, limitations, and open research questions in the current literature on privacy-enhanced FL. 3. To evaluate and compare different privacy-enhancing techniques and methodologies used in FL, assessing their effectiveness, scalability, and trade-offs. FL inherently mitigates several privacy risks by keeping data local to clients. However, it introduces new challenges, particularly related to inference attacks and model update poisoning. Inference attacks exploit model updates to extract sensitive information, while model update poisoning involves malicious clients injecting false updates to corrupt the global model. These challenges necessitate robust solutions to ensure the integrity and privacy of the FL process. Non-IID data and communication overheads further complicate FL implementation. Non-IID data, where data distributions vary across clients, can hinder model convergence and performance. Additionally, frequent and substantial data exchanges between clients and servers result in significant communication overheads, which can strain network resources. Several strategies have been developed to address these privacy and security challenges. Differential privacy introduces noise to data updates, ensuring that individual contributions remain confidential. Protocols that incorporate cryptographic signatures and Secure Multiparty Computation techniques further enhance the security of model updates and ensure data integrity. Co-utility frameworks, which promote mutual benefit between servers and clients, and robust aggregation methods also play vital roles in safeguarding FL systems. Innovative methodologies such as Flamingo and SafeFL leverage advanced cryptographic techniques to provide secure aggregation and enhance privacy preservation. These solutions collectively improve the robustness, efficiency, and security of FL frameworks, enabling their application in real-world scenarios. FL has been applied successfully in various domains, demonstrating its versatility and effectiveness. In wireless communication, FL enhances vehicular communication, localization, and semantic communication by enabling collaborative model training without data centralization. In the IoT sector, FL improves privacy and reduces data transfer costs, with significant applications in smart homes and industrial IoT. Healthcare is another critical area where FL has made substantial impacts. By allowing institutions to collaboratively train models on medical imaging and predictive analytics without sharing patient data, FL addresses stringent privacy regulations while improving model accuracy and generalizability. Studies have shown that FL can maintain high diagnostic accuracy and support personalized medicine. In the financial sector, FL addresses privacy and regulatory challenges by enabling collaborative credit risk assessment and fraud detection. By leveraging data from multiple institutions without centralizing it, FL-based models achieve higher accuracy and adaptability, enhancing the detection of fraudulent activities and improving credit scoring models. Surveys play indispensable roles and offer numerous benefits within the FL domain. They serve as comprehensive repositories of existing research, providing newcomers with a foundational understanding while guiding experienced researchers toward unexplored frontiers. By scrutinizing and synthesizing a plethora of literature, surveys identify emerging trends, highlight successful applications, and outline future research directions. Federated Learning presents a transformative approach to machine learning by enabling decentralized data processing, which addresses critical privacy and security concerns inherent in traditional centralized models. This thesis explored various facets of FL, particularly focusing on the challenges and solutions related to privacy and security, as well as its diverse applications across different sectors. Emerging trends in FL research, including advancements in cryptographic techniques, federated learning frameworks, and regulatory compliance mechanisms, underscore the need for continuous innovation and interdisciplinary collaboration. As FL continues to evolve, it holds the potential to revolutionize secure communication systems and foster a culture of security awareness and privacy by design in machine learning technologies.
Açıklama
Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2024
Anahtar kelimeler
Privacy,
Mahremiyet,
Security,
Güvenlik,
Machine learning,
Makine öğrenmesi