Federated Learning for Predictive Healthcare Analytics: From theory to real world applications

. In the contemporary landscape, machine learning has a pervasive impact across virtually all industries. However, the success of these systems hinges on the accessibility of training data. In today's world, every device generates data, which can serve as the building blocks for future technologies. Conventional machine learning methods rely on centralized data for training, but the availability of sufficient and valid data is often hindered by privacy concerns. Data privacy is the main concern while developing a healthcare system. One of the technique which allow decentralized learning is Federated Learning. Researchers have been actively applying this approach in various domains and have received a positive response. This paper underscores the significance of employing Federated Learning in the healthcare sector, emphasizing the wealth of data present in hospitals and electronic health records that could be used to train medical systems.


Introduction
Machine Learning (ML) as a branch of artificial intelligenceis the study of developing algorithms and models that allow computers to learn and make predictions or judgments without being explicitly programmed.In general, ML enables computers to learn from data, find patterns and links, and make knowledgeable predictions or decisions, ultimately improving their capacity to carry out challenging tasks and offer insightful information [2].
ML-based automation in healthcare has the potential to revolutionize numerous elements of healthcare delivery, including administration, diagnostics, monitoring, and treatment.Large volumes of healthcare data may be analyzed by ML algorithms, which can then spot problems based on the detecting patterns and can generate predictions or suggestions that improve efficiency, accuracy, and outcomes.Here are some instances of ML being used for automation in the healthcare industry such as medical imaging, diagnosis of disease and prognosis, drug discovery and development, clinical decision support systems, remote monitoring and telemedicine, and for various other healthcare operations and administrative tasks.
Traditional ML techniques are facing few challenges such as privacy preservation, data security, collaborative learning and resource efficiency etc.These challenges are addresses by FL technique.It allows for the training of ML models across numerous servers or devices while limiting the transfer of raw data.In 2017, H. Brendan McMahan, in collaboration with fellow researchers, put forward the idea of FL in the paper [3].This paper emerged from Google research and outlined the foundational principles of FL.The formal definition of FL [4] is: "Federated Learning is a ML setting where multiple entities (clients) collaborate in solving a machine learning problem, under the coordination of a central server or service provider.Each client's raw data is stored locally and not exchanged or transferred; instead, focused updates intended for immediate aggregation are used to achieve the learning objective."Numerous FL aggregated algorithms are available.The selection of an appropriate aggregated algorithm relies on the specific problem in hand.Few FL-aggregation algorithms are Federated Averaging (FedAvg) [6], Federated Stochastic Gradient Descent (FedSGD) [7], FedMA [8], MHAT (Model-Heterogeneous Aggregation Training) [9], FedADAGRAD, FedADAM, FedYOGI [6], Federated Mediation (FedMed) [10], and Faster adaptive FL algorithm (FAFED) [11] etc. The description of various FL-aggregation algorithms is also available in paper [5].The main challenge while making an ML model is the use of centralized algorithms that rely on a single data source and suggest a decentralized framework for optimization for cooperative learning without explicitly exchanging raw data.

LITERATURE SURVEY
This article discuss the integration of MLand DL techniques within the framework of FLfor the identification of diverse health conditions.This paper also highlightsthe suggestions for future work.

1Research using supervised learning algorithms with FL
Theresearch [12]focuses on predicting the length of stay (LOS) of patients, which is crucial for hospitals to effectively manage resources and provide quality treatment.To address these challenges, the study proposes a federated ML-based model for forecasting patients' LOS.The model aggregates the results from multiple hospital clients, which have trained this"ML regression models" using their administrative data.The study [13] presents a FL based technique to extract information from Electronic Health Records (EHRs) of tumor patients of different hospitals.The method expands the use of Recursive Feature Elimination based on SVM and DL Important FeaTures (DeepLIFT) within the context of FL.FL-based Melanoma disease detection system is proposed in [14].This FL model combines skin lesion images with clinical information and safeguard the confidentiality of individuals during the training process.The results demonstrate an improvement of 0.40% in F1-Score and an approximate 0.70% increase in accuracy compared to the centralized learning approaches.In the paper [15] employs SVM in FL settingsfor heart disease classification.The performance of this model was evaluated using merged cardiovascular disease data, leading to a 1.5% enhancement in prediction accuracy.The presented system [16] aims to identify instances of Facial Paralysis using SVM model in FL environment..The dataset includes distinct sets of facial images from individuals affected by Facial Paralysis and those without any such condition.The achieved accuracy is approximately 91%.Here, the model [17] is for "Autism Spectrum Disorder (ASD)" detection.FL scheme is used to screen the patients at local screening center and the collected data is used to locally train the model and updated models are aggregated at a center point.The dataset consists of behavioral and facial images.This model use four ML models such as Logistic Regression for feature extraction, Neural Networks for classification, Decision Trees,K-Nearest Neighbors for classification of unlabelled data.Evaluation of this system showing accuracy 63%.Authors of paper [18] use Diabetes Mellitus risk prediction as a case study, employing various algorithms such as XGBoost, LightGBM, Neural Networks, and Logistic Regression.The newly introduced 'Seceum' FL Platform supports the collaborative data modeling process between different organizations.The results underscore the advantages of employing FLmodels.By leveraging patient data from different organizations, the approach yields more reliable and predictions of Diabetes Mellitus risks.

Research using unsupervised learning algorithms with FL
"AnoFed," a novel framework that unifies federated transformer-based autoencoders and variational autoencoders with support vector data description (SVDD)for anomaly detection in ECG [19].It showed approximately 5% improvement in the accuracy as compared to existing systems.The study [20]proposes a sparse autoencoder network to extract crucial picture attributes from medical photographs of the skin, the training process of this model is done in decentralized manner.In this model [21], the process of "feature extraction and segmentation of vertebral bodies" is carried out through the utilization of DAF-U-Net.The choice of U-Net stems from its notable effectiveness in segmenting medical images.The framework is named "Federated Learning-based Vertebral Body Segment Framework (FLVBSF)".The employment of U-Net-based Directed Acyclic Graphs (DAGs) yields an accuracy of approximately 98%.This study [22] presents a game-theory based security model for FL.The suggested model, known as NVAS, is a FL aggregation system which offers a thorough plan for developing a COVID-19 detection and prevention system that integrates game theory, wireless communication, and AI.The paper [23]presents a novel framework called "Federated Learning and Reinforcement Learning Strategy (FLRLS)" that utilizes lab urine data for detecting urinary tract infections (UTIs).The model is based on reinforcement learning used in FL settings.By synergizing FL, reinforcement learning, and combinatorial optimization, the framework achieves a balance between high accuracy and minimal detection delay in UTI identification.

3 Research using semi supervised learning algorithms with FL
Asemi-supervised learning based model "FedCy" is proposed in the study [24].This method makes use of a decentralized dataset that includes both labelled and unlabeled validate its usefulness.The results demonstrated significant improvements in automatically recognizing surgical phases.The COVID-19 detection study [25] propose, a semi-supervised learning approach in FL setting.The dataset used for this model comprises 1706 CT scans of patients with COVID-19.

4 Research using deep learning algorithms with FL
In the study [26], the researchers uses clinical natural language processing to identify the likelihood of impatient violence.Results indicate that using FL in healthcare systems is a good idea.This model uses neural network for decentralized training on EHRs data.The research [27]focuses on the detection and diagnosis of brain tumors (BTs)from magnetic resonance imaging (MRI) scansthat combines DL techniques with a distributed FL algorithm.The model was evaluated using cross-validation methods on two established datasets: BT-small2c and BT-large-3c and achieved classification accuracy approx.82% and 96 % respectively.A lightweight "CoviFL CNN model" model is proposed in this study [28]which was utilised to train AIoMT edge devices utilising local datasets.Additionally, these AIoMT devices are capable of detecting COVID-19 through coughing audio.93.01%accuracy.Detecting of Pneumonia early is crucial driving the adoption of advanced ML techniques but data sharing constraints restrict third-party access.The study [29]proposed the solution of this problem using cutting-edge ML models like Alexnet, DenseNet, ResNet-50, Inception, and VGG19.Promisingly, preliminary outcomes indicate ResNet-50's in FL has exceptional performance, achieving a significant approx.90% accuracy on the testing dataset.

COMPARISON OF FEDERATED LEARNING MODELS VS. CENTRALIZED LEARNING MODELS
Vast patient datasets within hospitals hold the potential to serve as the foundation for numerous ML models.FL facilitates decentralized learning and delivers effective outcomes by preserving the privacy of the data, as indicated in the findings presented in table 1 76% [40] There are various researches like brain tumor detection [41] used supervised learning, COVID-19 detection from CT scans [42], COVID-19 detection from X-ray images [43], skin disease detection [44] used unsupervised learning techniques are showing good results with centralized datasets but implementation of these models with FLcan be done.

FUTURE WORK
Numerous researches are going on in this field.Few open area of research in FL are explained below: Promote the integration of Unsupervised Learning Methods in FL While many researchers predominantly utilize supervised learning methods within the framework of FL, the efficacy of unsupervised learning techniques has been demonstrated to surpass that of supervised approaches in numerous healthcaresystems.These findings suggest to incorporating unsupervised learning methods within FL environments presents a favorable choice, particularly when addressing scenarios involving unlabeled data [45].

1 Utilise the IOT Devices' data
Wearables and smartwatches, which are common IoT devices, are constantly producing vast amounts of healthcare data and that data can be used in training various ML-based healthcare models [46].It is crucial to just utilise this data for research purposes without forwarding or disclosing it to anybody else [47].

2 Resolve Heterogeneity issue
Heterogeneity presents a notable challenge for FL, especially regarding the varying characteristics of client devices.Recent survey [48] introduces a potential solution in which the pre-identified diversity of devices allows for the categorization of mobile devices based on their heterogeneity.[49] Each categorized group is then assigned a dedicated local central server.[50] A promising avenue for future research involves researching into the realm of multi-centerFL [51] to effectively tackle the complexities brought about by this heterogeneity.[52] 4

Few other decentralized learningtechniques
Split learning [53] and TinyML [54][55] are also showing optimal results with decentralized datasets.Split learning with FL has been used to implement an efficient model SplitFed [56].So, researchers can adopt these learning techniques in their researches.[57]

CONCLUSION
The central objective of this study is to furnish an inclusive outline of the application of various types of ML and DL algorithms within a FL context to streamline healthcare systems.The materials under scrutiny were drawn from reputable , 01003 (2024) BIO Web of Conferences https://doi.org/10.1051/bioconf/2024860100386 RTBS-2023 research databases such as Elsevier, Springer, IEEE, and Pubmed, among others.Recent investigations underscore the adoption of SVM, autoencoders, Convolutional Neural Networks (CNN), Graph Generative Adversarial Network (GAN), and transfer learning algorithms.The findings of this analysis reveal a scarcity of research focusing on unsupervised and reinforcement learning within the domain of FL in healthcare.A considerable proportion of FLresearch leans towards transfer learning and CNN.Consequently, there exists an avenue for future research endeavors in FL, particularly exploring the potential of unsupervised learning and reinforcement learning algorithms.

Figure 1 2023 Fig. 2 :
Figure1illustrates the training of a ML model within FL settings.The FL process[5]has described in algorithm 1.

Table 1 :
. Comparison of FL models vs. Centralized Learning Models