• Home
  • User Documentation
  • About
  • More
    • Funding
    • News
    • Contributors
    • Users
    • Roadmap
    • Contact Us
  • Home
  • User Documentation
  • About
  • More
    • Funding
    • News
    • Contributors
    • Users
    • Roadmap
    • Contact Us
  • Getting Started
    • What's Fed-BioMed
    • Fedbiomed Architecture
    • Fedbiomed Workflow
    • Installation
    • Basic Example
    • Configuration
  • Tutorials
    • PyTorch
      • PyTorch MNIST Basic Example
      • How to Create Your Custom PyTorch Training Plan
      • PyTorch Used Cars Dataset Example
      • PyTorch aggregation methods in Fed-BioMed
      • Transfer-learning in Fed-BioMed tutorial
    • MONAI
      • Federated 2d image classification with MONAI
      • Federated 2d XRay registration with MONAI
    • Scikit-Learn
      • MNIST classification with Scikit-Learn Classifier (Perceptron)
      • Fed-BioMed to train a federated SGD regressor model
      • Implementing other Scikit Learn models for Federated Learning
    • Optimizers
      • Advanced optimizers in Fed-BioMed
    • FLamby
      • General Concepts
      • FLamby integration in Fed-BioMed
    • Advanced
      • In Depth Experiment Configuration
      • PyTorch model training using a GPU
      • Breakpoints
    • Security
      • Using Differential Privacy with OPACUS on Fed-BioMed
      • Local and Central DP with Fed-BioMed: MONAI 2d image registration
      • Training Process with Training Plan Management
      • Training with Secure Aggregation
      • End-to-end Privacy Preserving Training and Inference on Medical Data
    • Biomedical data
      • Brain Segmentation
  • User Guide
    • Glossary
    • Deployment
      • Introduction
      • VPN Deployment
      • Network matrix
      • Security model
    • Node
      • Configuring Nodes
      • Deploying Datasets
      • Training Plan Management
      • Using GPU
      • Node GUI
    • Researcher
      • Training Plan
      • Training Data
      • Experiment
      • Aggregation
      • Listing Datasets and Selecting Nodes
      • Model Validation on the Node Side
      • Tensorboard
    • Optimization
    • Secure Aggregation
      • Introduction
      • Configuration
      • Managing Secure Aggregation in Researcher
  • Developer
    • API Reference
      • Common
        • Certificate Manager
        • CLI
        • Config
        • Constants
        • Data
        • DB
        • Exceptions
        • IPython
        • Json
        • Logger
        • Message
        • Metrics
        • Model
        • MPC controller
        • Optimizers
        • Privacy
        • Secagg
        • Secagg Manager
        • Serializer
        • Singleton
        • Synchro
        • TasksQueue
        • TrainingPlans
        • TrainingArgs
        • Utils
        • Validator
      • Node
        • CLI
        • CLI Utils
        • Config
        • DatasetManager
        • HistoryMonitor
        • Node
        • NodeStateManager
        • Requests
        • Round
        • Secagg
        • Secagg Manager
        • TrainingPlanSecurityManager
      • Researcher
        • Aggregators
        • CLI
        • Config
        • Datasets
        • Federated Workflows
        • Filetools
        • Jobs
        • Monitor
        • NodeStateAgent
        • Requests
        • Secagg
        • Strategies
      • Transport
        • Client
        • Controller
        • NodeAgent
        • Server
    • Usage and Tools
    • Continuous Integration
    • Definition of Done
    • Development Environment
    • Testing in Fed-BioMed
    • RPC Protocol and Messages
  • FAQ & Troubleshooting
Download Notebook

Implementing other Scikit Learn models for Federated Learning¶

In this tutorial, you will learn how to define and run any Scikit Learn Supervised and Unsupervised model, as well as Data reduction methods, in Fed-BioMed.

1. Introduction¶

Like in previous tutorials with Pytorch, you can implement custom Scikit Learn models in Fed-BioMed. In this tutorial, we are summarizing all the steps to set up a Scikit Learn model in Fed-BioMed.

Current Scikit-Learn Methods implemented in Fed-BioMed¶


  • Classifiers:
    • SGDClassifier
    • Perceptron

  • Regressor:
    • SGDRegressor

  • Clustering:
    • Coming Soon!

Check out our User Guide for further information about Scikit Learn models available in Fed-BioMed.

2. Scikit-Learn training plan¶

As you could have seen in the previous tutorials concerning Scikit-Learn, you should define a "Scikit-Learn training plan". We provide here a template to create a TrainingPlan for Scikit Learn. As for PyTorch training plan, every Scikit-Learn Training Plan class should be inherited from one of the "FedPerceptron", "FedSGDRegressor", "FedSGDClassifier" classes.

Training Plan for supervised Learning (Regressor and Classifier)¶


A template of a Supervised Learning algorithm for Scikit-Learn models. Each supported SkLearn model can be imported from the module fedbiomed.common.training_plan. Currently Fed-BioMed support following SkLearn models "FedPerceptron", "FedSGDRegressor", "FedSGDClassifier".

In [ ]:
Copied!
from fedbiomed.common.training_plans import FedSGDRegressor, FedPerceptron, FedSGDClassifier

SelectedTrainingPlan = FedPerceptron


class SkLearnTrainingPlan(SelectedTrainingPlan):
    def init_dependencies(self):
        # The method for declaring dependencies that are used generally in this training plan.
        # E.g, `import numpy as np`should be added dependency array if it is used in the training_data method.
        deps= ["import numpy as np",
               "import pandas as pd"]
        return deps

    def training_data(self):
        # Define here how data are handled and /or shuffled
        # First you need to instantiate the dataset. This will be typically something like
        # raw_dataset = pd.read_csv(self.dataset_path)
        # X = raw_dataset[feature_columns]
        # y = raw_dataset[target_column(s)]

        return DataManager(dataset=X.values, target=y.values,  shuffle=True, drop_last=False)
from fedbiomed.common.training_plans import FedSGDRegressor, FedPerceptron, FedSGDClassifier SelectedTrainingPlan = FedPerceptron class SkLearnTrainingPlan(SelectedTrainingPlan): def init_dependencies(self): # The method for declaring dependencies that are used generally in this training plan. # E.g, `import numpy as np`should be added dependency array if it is used in the training_data method. deps= ["import numpy as np", "import pandas as pd"] return deps def training_data(self): # Define here how data are handled and /or shuffled # First you need to instantiate the dataset. This will be typically something like # raw_dataset = pd.read_csv(self.dataset_path) # X = raw_dataset[feature_columns] # y = raw_dataset[target_column(s)] return DataManager(dataset=X.values, target=y.values, shuffle=True, drop_last=False)

Training a Scikit Learn model is pretty similar to training a Pytorch model. The only difference is the selection of model hyperparameters (contained in model_args) and training parameters (in training_args). Initializing the class Experiment will allow the Researcher to search for active nodes tagged with defined tags.

In [ ]:
Copied!
from fedbiomed.researcher.federated_workflows import Experiment

tags =  ['#MNIST', '#dataset']
rounds = 5

# select nodes participating to this experiment
exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=SkLearnTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)
from fedbiomed.researcher.federated_workflows import Experiment tags = ['#MNIST', '#dataset'] rounds = 5 # select nodes participating to this experiment exp = Experiment(tags=tags, model_args=model_args, training_plan_class=SkLearnTrainingPlan, training_args=training_args, round_limit=rounds, aggregator=FedAverage(), node_selection_strategy=None)

3.1 Arguments for model definition and model training:¶

  • tags: a list containing tags that will be used for finding models. Same as for PyTorch models.

  • model_args: a Python dictionary containing all arguments related to the model (ie all Scikit Learn model parameters). In addition, it MUST include the following fields:

    • n_features: number of features in the dataset
    • n_classes: number of classes (for classification or clustering algorithms only, ignored if a Regression algorithm is used).
  • training_plan_class: the Scikit-Learn training Plan class. Same as for Pytorch models.

  • training_args: a dictionary containing training parameter. For the moment, it contains the following entries:

    • epochs: the number of epoch to be performed locally (ie on each node).
  • round_limit: the number of rounds (ie global aggregations) to be performed. Same as for PyTorch models.

  • aggregator: the aggregation strategy, here Federated Average. More information on User Guide/Aggregators. Same as for PyTorch models.

  • node_selection_startegy: how to select/sample nodes among all available nodes. Same as for Pytorch models.


3.2 Training the model¶

Calling the run method from Experiment will train the Federated Model.

In [ ]:
Copied!
exp.run()
exp.run()

Save trained model to file

In [ ]:
Copied!
exp.training_plan().export_model('./trained_model')
exp.training_plan().export_model('./trained_model')

3.3 Retrieve model weights for each Federated round.¶

The history of each round is accessed via aggregated_params() attribute of Experiment class. In fact, aggregated model at each round is contained in a dictionary, where each key corresponds to a specific round. Each key is mapping an aggregated model obtained through the round.

To extract all the history, enter :

In [ ]:
Copied!
exp.aggregated_params()
exp.aggregated_params()


More algorithms from Scikit-Learn are coming soon ! Stay Tuned !

Download Notebook
  • 1. Introduction
    • Current Scikit-Learn Methods implemented in Fed-BioMed
  • 2. Scikit-Learn training plan
    • Training Plan for supervised Learning (Regressor and Classifier)
    • 3.1 Arguments for model definition and model training:
    • 3.2 Training the model
    • 3.3 Retrieve model weights for each Federated round.
Address:

2004 Rte des Lucioles, 06902 Sophia Antipolis

E-mail:

fedbiomed _at_ inria _dot_ fr

Fed-BioMed © 2022