Implementing other Scikit Learn models for Federated Learning¶

In this tutorial, you will learn how to define and run any Scikit Learn Supervised and Unsupervised model, as well as Data reduction methods, in Fed-BioMed.

1. Introduction¶

Like in previous tutorials with Pytorch, you can implement custom Scikit Learn models in Fed-BioMed. In this tutorial, we are summarizing all the steps to set up a Scikit Learn model in Fed-BioMed.

Current Scikit-Learn Methods implemented in Fed-BioMed¶

Classifiers:
- SGDClassifier
- Perceptron

Regressor:
- SGDRegressor

Clustering:
- Coming Soon!

Check out our User Guide for further information about Scikit Learn models available in Fed-BioMed.

2. Scikit-Learn training plan¶

As you could have seen in the previous tutorials concerning Scikit-Learn, you should define a "Scikit-Learn training plan". We provide here a template to create a TrainingPlan for Scikit Learn. As for PyTorch training plan, every Scikit-Learn Training Plan class should be inherited from one of the "FedPerceptron", "FedSGDRegressor", "FedSGDClassifier" classes.

Training Plan for supervised Learning (Regressor and Classifier)¶

A template of a Supervised Learning algorithm for Scikit-Learn models. Each supported SkLearn model can be imported from the module fedbiomed.common.training_plan. Currently Fed-BioMed support following SkLearn models "FedPerceptron", "FedSGDRegressor", "FedSGDClassifier".

In [ ]:

  Copied!     
 
from fedbiomed.common.training_plans import FedSGDRegressor, FedPerceptron, FedSGDClassifier

SelectedTrainingPlan = FedPerceptron


class SkLearnTrainingPlan(SelectedTrainingPlan):
    def init_dependencies(self):
        # The method for declaring dependencies that are used generally in this training plan.
        # E.g, `import numpy as np`should be added dependency array if it is used in the training_data method.
        deps= ["import numpy as np",
               "import pandas as pd"]
        return deps

    def training_data(self):
        # Define here how data are handled and /or shuffled
        # First you need to instantiate the dataset. This will be typically something like
        # raw_dataset = pd.read_csv(self.dataset_path)
        # X = raw_dataset[feature_columns]
        # y = raw_dataset[target_column(s)]

        return DataManager(dataset=X.values, target=y.values,  shuffle=True, drop_last=False)
from fedbiomed.common.training_plans import FedSGDRegressor, FedPerceptron, FedSGDClassifier SelectedTrainingPlan = FedPerceptron class SkLearnTrainingPlan(SelectedTrainingPlan): def init_dependencies(self): # The method for declaring dependencies that are used generally in this training plan. # E.g, `import numpy as np`should be added dependency array if it is used in the training_data method. deps= ["import numpy as np", "import pandas as pd"] return deps def training_data(self): # Define here how data are handled and /or shuffled # First you need to instantiate the dataset. This will be typically something like # raw_dataset = pd.read_csv(self.dataset_path) # X = raw_dataset[feature_columns] # y = raw_dataset[target_column(s)] return DataManager(dataset=X.values, target=y.values, shuffle=True, drop_last=False)

Training a Scikit Learn model is pretty similar to training a Pytorch model. The only difference is the selection of model hyperparameters (contained in model_args) and training parameters (in training_args). Initializing the class Experiment will allow the Researcher to search for active nodes tagged with defined tags.

In [ ]:

  Copied!     
 
from fedbiomed.researcher.federated_workflows import Experiment

tags =  ['#MNIST', '#dataset']
rounds = 5

# select nodes participating to this experiment
exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=SkLearnTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)
from fedbiomed.researcher.federated_workflows import Experiment tags = ['#MNIST', '#dataset'] rounds = 5 # select nodes participating to this experiment exp = Experiment(tags=tags, model_args=model_args, training_plan_class=SkLearnTrainingPlan, training_args=training_args, round_limit=rounds, aggregator=FedAverage(), node_selection_strategy=None)

3.1 Arguments for model definition and model training:¶

tags: a list containing tags that will be used for finding models. Same as for PyTorch models.
model_args: a Python dictionary containing all arguments related to the model (ie all Scikit Learn model parameters). In addition, it MUST include the following fields:
- n_features: number of features in the dataset
- n_classes: number of classes (for classification or clustering algorithms only, ignored if a Regression algorithm is used).
training_plan_class: the Scikit-Learn training Plan class. Same as for Pytorch models.
training_args: a dictionary containing training parameter. For the moment, it contains the following entries:
- epochs: the number of epoch to be performed locally (ie on each node).
round_limit: the number of rounds (ie global aggregations) to be performed. Same as for PyTorch models.
aggregator: the aggregation strategy, here Federated Average. More information on User Guide/Aggregators. Same as for PyTorch models.
node_selection_startegy: how to select/sample nodes among all available nodes. Same as for Pytorch models.

3.2 Training the model¶

Calling the run method from Experiment will train the Federated Model.

In [ ]:

  Copied!     
 
exp.run()
 exp.run()

Save trained model to file

In [ ]:

  Copied!     
 
exp.training_plan().export_model('./trained_model')
exp.training_plan().export_model('./trained_model')

3.3 Retrieve model weights for each Federated round.¶

The history of each round is accessed via aggregated_params() attribute of Experiment class. In fact, aggregated model at each round is contained in a dictionary, where each key corresponds to a specific round. Each key is mapping an aggregated model obtained through the round.

To extract all the history, enter :

In [ ]:

  Copied!     
 
exp.aggregated_params()
exp.aggregated_params()

More algorithms from Scikit-Learn are coming soon ! Stay Tuned !