Advanced optimizers in Fed-BioMed¶
Difficulty level: advanced
Introduction¶
This tutorial presents on how to deal with heterogeneous dataset by changing its Optimizer. In Fed-BioMed, one can specify two sort of Optimizers:
- a
Optimizeron theNodeside, defined on theTraining Plan - a
Optimizeron theResearcherside, configured in theExperiment
Advanced Optimizer are backed by declearn package, a python package focused on Optimization for Federated Learning. Advanced Optimizer can be used regardless of the machine learning framework (meaning it is compatible with both sklearn and PyTorch)
In this tutorial you will learn:
- how to use and chain one or several
OptimizersonNodeandResearcherside - how to use fedopt
- how to use
Optimizersthat exchange auxiliary variables such asScaffold
For further details you can refer to the Optimizer section in the User Guide as well as the declearn documentation on Optimizers.
1. Configuring Nodes¶
Before starting, we need to configure several Nodes and add MedNist dataset to it. Node configuration steps require fedbiomed-node conda environment. Please make sure that you have the necessary conda environment: this is explained in the installation tutorial.
Please open a terminal, cd to the base directory of the cloned fedbiomed project and follow the steps below.
- Configuration Steps:
- Run
fedbiomed node dataset addin the terminal - It will ask you to select the data type that you want to add. The third option has been configured to add the MedNIST dataset. Please type
3and continue. - Please use default tags which are
#MEDNISTand#dataset. - For the next step, please select the directory that you want to download the MNIST dataset.
- After the download is completed you will see the details of the MNIST dataset on the screen.
- Run
Please run the command below in the same terminal to make sure the MNIST dataset is successfully added to the Node.
$ fedbiomed node --path my-node dataset add
$ fedbiomed node --path my-node start
In another terminal, you may proceed by launching a second Node. Please repeat the above configuration steps, but by specifying another configuration file (for instance conf2.ini).
$ fedbiomed node --path my-second-node dataset add
$ fedbiomed node --path my-second-node start
2. Defining an Optimizer on Node side¶
Optimizers are defined through the init_optimizer method of the training plan. They must be set using Fed-BioMed Optimizer object (ie from fedbiomed.common.optimizers.optimizer.Optimizer)
2.1 With PyTorch framework¶
In this tutorial we have showcased the use of a PyTorch model with PyTorch native optimizers, such as torch.optim.SGD. In the present tutorial, we will see how to use declearn cross frameworks optimizers
PyTorch Training Plan
Below is a simple implementation of a declearn SGD Optimizer on a PyTorch model. It is equivalent to the following Training Plan (that uses native Pytorch Optimizer torch.optim.SGD):
class MyTrainingPlan(TorchTrainingPlan):
...
def init_optimizer(self, optimizer_args):
return torch.optim.SGD(self.model().parameters(), lr = optimizer_args['lr'])
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms
from torchvision.models import densenet121
from fedbiomed.common.optimizers.optimizer import Optimizer
# Here we define the model to be used.
# we will use the densnet121 model
class MyTrainingPlan(TorchTrainingPlan):
def init_dependencies(self):
deps = ["from torchvision import datasets, transforms",
"from torchvision.models import densenet121",
"from fedbiomed.common.optimizers.optimizer import Optimizer"]
return deps
def init_model(self):
self.loss_function = torch.nn.CrossEntropyLoss()
model = densenet121(pretrained=True)
model.classifier =nn.Sequential(nn.Linear(1024,512), nn.Softmax())
return model
def init_optimizer(self, optimizer_args):
# Defines and return a declearn optimizer
# equivalent: Optimizer(lr=optimizer_args['lr'], modules=[], regurlarizers=[])
return Optimizer(lr=optimizer_args['lr'])
def training_data(self, batch_size = 48):
preprocess = transforms.Compose([transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
)])
train_data = datasets.ImageFolder(self.dataset_path,transform = preprocess)
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
return DataManager(dataset=train_data, **train_kwargs)
def training_step(self, data, target):
output = self.model().forward(data)
loss = self.loss_function(output, target)
return loss
2.2 Sklearn Training Plan¶
For another machine learning framework such as sklearn, init_optimizer method syntax is the same
from fedbiomed.common.training_plans import FedSGDClassifier
from fedbiomed.common.data import DataManager
from fedbiomed.common.optimizers.optimizer import Optimizer
from torchvision import datasets, transforms
import torch
class MyTrainingPlan(FedSGDClassifier):
# Declares and return dependencies
def init_dependencies(self):
deps = ["from torchvision import datasets, transforms",
"from fedbiomed.common.optimizers.optimizer import Optimizer",
"import torch"]
return deps
def training_data(self, batch_size):
# in comparison to PyTorch Training Plan, preprocess involves additional steps in order to be used
# with sklearn SGDClassifier, which is expecting vectors in lieu of arrays
# here we are grayscaling and reshaping images
squeezer = lambda x: torch.squeeze(x) # removes extra dimensions
preprocess = transforms.Compose([transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
),
transforms.Grayscale(1),
transforms.Resize((64*64, 1)),
transforms.Lambda(squeezer)
])
train_data = datasets.ImageFolder(self.dataset_path,transform = preprocess)
return DataManager(dataset=train_data, batch_size=batch_size)
# Defines and return a declearn optimizer
def init_optimizer(self, optimizer_args):
return Optimizer(lr=optimizer_args['lr'])
2.3 Using a more advanced Optimizer with Regularizer¶
Optimizer from fedbiomed.common.optimizers.optimizer with learning rate equals .1 can be written as Optimizer(lr=.1, decay=0., modules=[], regularizers=[]), where:
decayis the weight decay ;modulesis a python list containing no, one or severaldeclearnOptiModules;regularizersis a python list containing no, one or severaldeclearnRegularizers.
We will re-use the Pytorch Training Plan already defined above and show how to use a Adam Optimizer with Ridge as the Regularizer. For that, we need to import the Adam and the Ridge versions of declearn (AdamModule and RidgeRegularizer).
Then the Training Plan can be defined as follow (for PyTorch):
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms
from torchvision.models import densenet121
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import AdamModule, RidgeRegularizer
# Here we define the model to be used.
# we will use the densnet121 model
class MyTrainingPlan(TorchTrainingPlan):
def init_dependencies(self):
deps = ["from torchvision import datasets, transforms",
"from torchvision.models import densenet121",
"from fedbiomed.common.optimizers.optimizer import Optimizer",
"fedbiomed.common.optimizers.declearn import AdamModule",
"fedbiomed.common.optimizers.declearn import RidgeRegularizer"]
return deps
def init_model(self):
self.loss_function = torch.nn.CrossEntropyLoss()
model = densenet121(pretrained=True)
model.classifier =nn.Sequential(nn.Linear(1024,512), nn.Softmax())
return model
def init_optimizer(self, optimizer_args):
# Defines and return a declearn optimizer
# equivalent: Optimizer(lr=optimizer_args['lr'], modules=[], regurlarizers=[])
return Optimizer(lr=optimizer_args['lr'], modules=[AdamModule()], regularizers=[RidgeRegularizer()])
def training_data(self, batch_size = 48):
preprocess = transforms.Compose([transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
)])
train_data = datasets.ImageFolder(self.dataset_path,transform = preprocess)
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
return DataManager(dataset=train_data, **train_kwargs)
def training_step(self, data, target):
output = self.model().forward(data)
loss = self.loss_function(output, target)
return loss
2.4. Create the Experiment¶
Once the Training Plan has been created with a specific framework model, definition of the Experiment is the same as the one in PyTorch or Scikit-Learn, as shown below:
Note: There are a small additional parameters you have to configure in the model_args for scikit-learn, that have been added but that will be ignored for PyTorch model
lr = 1e-3
model_args = {'n_features': 64*64,
'n_classes' : 6,
'eta0':lr}
training_args = {
'loader_args': {
'batch_size': 8,
},
'optimizer_args': {
"lr" : lr
},
'dry_run': False,
'num_updates': 50
}
tags = ['#dataset', '#MEDNIST']
rounds = 2
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators import FedAverage
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
exp = Experiment()
exp.set_training_plan_class(training_plan_class=MyTrainingPlan)
exp.set_model_args(model_args=model_args)
exp.set_training_args(training_args=training_args)
exp.set_tags(tags = tags)
exp.set_aggregator(aggregator=FedAverage())
exp.set_round_limit(rounds)
exp.set_training_data(training_data=None, from_tags=True)
exp.set_strategy(node_selection_strategy=DefaultStrategy())
exp.run(increase=True)
Save trained model to file
exp.training_plan().export_model('./trained_model')
To get and display the content of all OptiModules (respectively the Regularizers) available and compatible with Fed-BioMed , one can use list_optim_modules (resp. list_optim_regularizers), as shown as below:
from fedbiomed.common.optimizers.declearn import list_optim_modules, list_optim_regularizers
list_optim_modules(), list_optim_regularizers()
3. Defining an Optimizer on Researcher side: FedOpt¶
In some case, you may want to use Adaptive Federated Optimization, also called FedOpt: the idea behind FedOpt is to optimize also the global model on Researcher side in addition to the Nodes local models, mainly to tackle data heterogeneity. Optimization on Researcher side is done by computing a pseudo gradient, which is the difference of the updates whithin 2 successive Rounds.
Adaptative Federated Optimization can be done in Fed-BioMed with declearn modules through the use of Experiment.set_agg_optimizer method.
Important: Please note that it is not possible to use native framework optimizers on Researcher side (such as torch.optim.Optimizer for instance). Only Fed-BioMed/declearn Optimizer can be used.
For instance, if one wants to use FedYogi, using the first Training Plan (that is based on SGD optimizer), the Experiment will be written as:
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators import FedAverage
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
from fedbiomed.common.optimizers.declearn import YogiModule as FedYogi
exp = Experiment()
exp.set_training_plan_class(training_plan_class=MyTrainingPlan)
exp.set_model_args(model_args=model_args)
exp.set_training_args(training_args=training_args)
exp.set_tags(tags = tags)
exp.set_aggregator(aggregator=FedAverage())
exp.set_round_limit(rounds)
exp.set_training_data(training_data=None, from_tags=True)
exp.set_strategy(node_selection_strategy=DefaultStrategy())
# here we are adding an Optimizer on Researcher side (FedYogi)
fed_opt = Optimizer(lr=.8, modules=[FedYogi()])
exp.set_agg_optimizer(fed_opt)
exp.run(increase=True)
Save trained model to file
exp.training_plan().export_model('./trained_model')
4. Defining Scaffold through Optimizer¶
In the following subsection, we will present Scaffold: Scaffold purpose is to limit the so called client drift that may happen when dealing with heterogenous dataset accross Nodes. For that, Scaffold involves the exchange between Nodes and Researcher of additional parameters called correction states, which quantitize how much clients has drifted (drift can be considered as the difference between client's local extrema and global extrema). In Fed-BioMed, additional parameters that are requiered by Optimizers are called auxiliary variables: correction states in Scaffold is one of them.
declearn comes with Scaffold as 2 OptiModules:
- a
ScaffoldClientModuleonNodeside ; - a
ScaffoldServerModuleonResearcherside.
For plain Scaffold, the Training Plan would look like (for a PyTorch model):
Important: FedAvg Aggregator in Fed-BioMed refers to the way model weights are aggregated, and should not be confused with the FedAvg algorithm, which is basically a SGD optimizer performed on Node side using FedAvg Aggregtor.
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms
from torchvision.models import densenet121
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import ScaffoldClientModule
# Here we define the model to be used.
# we will use the densnet121 model
class MyTrainingPlan(TorchTrainingPlan):
def init_dependencies(self):
deps = ["from torchvision import datasets, transforms",
"from torchvision.models import densenet121",
"from fedbiomed.common.optimizers.optimizer import Optimizer",
"from declearn.optimizer.modules import ScaffoldClientModule"]
return deps
def init_model(self):
self.loss_function = torch.nn.CrossEntropyLoss()
model = densenet121(pretrained=True)
model.classifier =nn.Sequential(nn.Linear(1024,512), nn.Softmax())
return model
def init_optimizer(self, optimizer_args):
# Defines and return a declearn optimizer
return Optimizer(lr=optimizer_args['lr'], modules=[ScaffoldClientModule()])
def training_data(self, batch_size = 48):
preprocess = transforms.Compose([transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
)])
train_data = datasets.ImageFolder(self.dataset_path,transform = preprocess)
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
return DataManager(dataset=train_data, **train_kwargs)
def training_step(self, data, target):
output = self.model().forward(data)
loss = self.loss_function(output, target)
return loss
The Experiment will be defined that way, with an Optimizer configured with ScaffoldServerModule :
lr = 1e-3
model_args = {}
training_args = {
'loader_args': {
'batch_size': 8,
},
'optimizer_args': {
"lr" : lr
},
'dry_run': False,
'num_updates': 50
}
tags = ['#dataset', '#MEDNIST']
rounds = 2
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators import FedAverage
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
from fedbiomed.common.optimizers.declearn import ScaffoldServerModule
exp = Experiment()
exp.set_training_plan_class(training_plan_class=MyTrainingPlan)
exp.set_model_args(model_args=model_args)
exp.set_training_args(training_args=training_args)
exp.set_tags(tags = tags)
exp.set_aggregator(aggregator=FedAverage())
exp.set_round_limit(rounds)
exp.set_training_data(training_data=None, from_tags=True)
exp.set_strategy(node_selection_strategy=DefaultStrategy())
# here we are adding an Optimizer on Researcher side (FedYogi)
fed_opt = Optimizer(lr=.8, modules=[ScaffoldServerModule()])
exp.set_agg_optimizer(fed_opt)
exp.run(increase=True)
Save trained model to file
exp.training_plan().export_model('./trained_model')
exp.run(rounds=1, increase=True)
Save trained model to file
exp.training_plan().export_model('./trained_model')
5. Explore advanced Optimizer feature through declearn and the Fed-BioMed user guide¶
Congrats!
In this tutorial, you learned how to conduct your Experiment using advanced cross framework Optimizerprovided by declearn. declearn modules offers the possibilty to chain Optimizers and Regularizers, making possible to customize as much as possible your federated Expermient. declearn compatible modules with Fed-BioMed are provided in fedbiomed.common.optimizers.declearn
For more in depth analysis on declearn Optimizer, please reach the Optimizer section in the User Guide
Please also check declearn documentation for further details reagrding declearn package.