Advanced Optimization in Fed-BioMed
Advanced Optimization can be done in Fed-BioMed through the use of declearn, a Python package that provides gradient-based Optimizers. declearn is cross-machine learning framework, meaning that it can be used with most machine learning frameworks (scikit-learn, PyTorch, Tensorflow, JAX, ...).
The following chapter explores in depth how to use declearn optimization feature in Fed-BioMed. For an example, please refer to the Advanced Optimizer tutorial.
1. Introduction to Declearn based Optimizer: a cross framework Optimizer library
1.1. What is declearn package?
declearn package is another Federated Learning framework modular and combinable, providing state-of-the-art gradient-based Optimizer algorithms. In Fed-BioMed, we are only using its Optimization facility, leaving aside all other components of declearn that we don't use in Fed-BioMed.
References: For further details about declearn, you may visit:
1.2. declearn interface in Fed-BioMed: the Optimizer object
In Fed-BioMed, we provide a Optimizer object, that works as an interface with declearn, and was made in order to use declearn's Optimizers (see below declearn's OptiModules and declearn's Regularizers).
from fedbiomed.common.optimizers import Optimizer
Optimizer(lr=.1, decay=.0, modules=[], regularizers=[])
with the following arguments:
-
lr: the learning rate; -
decay: the weight decay; -
modules: a list ofdeclearn'sOptiModules(or a list ofOptiModules' names); -
regularizers: a list ofdeclearn'sRegularizers(or a list ofRegularizers' names).
1.3. declearn's OptiModules
declearn OptiModules are modules that convey Optimizers, which purpose is to optimize a loss function (that can be written using a PyTorch loss function or defined in a scikit learn model) in order to optimize a model. Compatible declearn OptiModules with Fed-BioMed framework are defined in fedbiomed.common.optimizers.declearn module. They should be imported from fedbiomed.common.optimizers.declearn, as shown in the examples below. You can also import them direclty from declearn's declearn.optimizer.modules, but they will be no guarentees it is compatible with Fed-BioMed. In fact, recommended method is importing modules through fedbiomed.common.optimizers.declearn.
Usage:
-
For basic SGD (Stochastic Gradient Descent), we don't need to specify a
declearnOptiModuleand/or aRegularizerfrom fedbiomed.common.optimizers.optimizer import Optimizer lr = .01 Optimizer(lr=lr) -
For a specfic Optimizer like Adam, we need to import
AdamModulefromdeclearn. Hence, it yields:from fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import AdamModule lr = .01 Optimizer(lr=lr, modules=[AdamModule()]) -
It is possible to chain
Optimizerwith severalOptiModules, meaning to use severalOptimizers. Some chains ofOptiModulemay be non-sensical, so use it at your own risk! Below we showcase the use of Adam with Momentumfrom fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import AdamModule, MomentumModule lr = .01 Optimizer(lr=lr, modules=[AdamModule(), MomentumModule()]) -
To get all comptible
OptiModulein Fed-BioMed, one can run the [list_optim_modules][fedbiomed.common.optimizers.declearn.list_optim_modules].from fedbiomed.common.optimizers.declearn import list_optim_modules list_optim_modules()
For further information on declearn OptiModule, please visit declearn OptiModule and declearn's Optimizers documentation.
List of available Optimizers provided by declearn
To get a list of all available Optimizers in declearn, please enter (after having loaded Fed-BioMed conda environment)
from declearn.optimizer import list_optim_modules
list_optim_modules()
1.4. declearn's Regularizers
declearn's Regularizers are objects that enable the use of Regularizer, which add an additional term to the loss function one wants to Optimize through the use of optimizer. It mainly helps to get a more generalizable model, and prevents overfitting.
The optimization equation yields to:
with
\(\theta_t : \textrm{model weights at update } t\)
\(\eta : \textrm{learning rate}\)
\(\alpha : \textrm{regularization coefficient}\)
\(f_{x,y}: \textrm{Loss function used for optimizing the model}\)
Regularizers should be used and combined with an Optimizer. For instance, SGD with Ridge regression, or Adam with Lasso regression. FedProx is also considered as a regularization.
Optimizer without OptiModules
When no OptiModules are specified in the modules argument of Optimizer, plain SGD algorithm is set by default for the optimization.
Usage:
-
for example, SGD with Ridge regression will be written as:
from fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import RidgeRegularizer lr = .01 Optimizer(lr=lr, regularizers=[RidgeRegularizer()]) -
Adam with Lasso Regression:
from fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import AdamModule from fedbiomed.common.optimizers.declearn import LassoRegularizer lr = .01 Optimizer(lr=lr, modules=[AdamModule()], regularizers=[LassoRegularizer()]) -
Chaining several Regularizers: an example with Ridge and Lasso regularizers, and Adam with momentum as the Optimizer.
from fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import LassoRegularizer, RidgeRegularizer lr = .01 Optimizer(lr=lr, modules=[AdamModule(), MomentumModule()], regularizers=[LassoRegularizer(), RidgeRegularizer()])
For further information on declearn Regularizer, please visit declearn Regularizers documentation webpage
1.5. Chaining Optimizers and Regularizers with declearn modules
It is possible in declearn to chain several OptiModules and Regularizers in an Optimizer. Generaly speaking, Optimizer in declearn can be written as:
with
\(Opt : \textrm{an OptiModule}\)
\(Reg : \textrm{a Regularizer}\)
\(\theta : \textrm{model weight}\)
\(\eta : \textrm{learning rate}\)
\(\tau : \textrm{weight decay}\)
\(f_{x,y}: \textrm{Loss function used for optimizing the model}\)
The above holds for a single Regularizer and OptiModule. When using (ie chaining) several OptiModules and Regularizers in an Optimizer, the above optimization equation becomes:
where
\(Opt_{1\le i \le n}: \textrm{ OptiModules, with } n \textrm{ the total number of OptiModules used}\)
\(Reg_{1\le i \le m}: \textrm{ Regularizers, with } m \textrm{ the total number of Regularizers used}\)
Example: let's write an Optimizer using RMSProp and Momentum OptiModules, and both Lasso and Ridge Regularizers.
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import RMSPropModule, MomentumModule, LassoRegularizer, RidgeRegularizer
lr = .01
Optimizer(lr=lr,
modules=[RMSPropModule(), MomentumModule()],
regularizers=[LassoRegularizer(), RidgeRegularizer()])
Using list of strings instead of list of modules
In declearn, it is possible to use name of modules instead of loading the actual modules. In the script below, we are rewritting the same Optimizer as the one above but by specifying the module names. A convinient way to get the naming is to use list_optim_modules and list_optim_regularizers functions, that map module names with their classes respectively.
from fedbiomed.common.optimizers.optimizer import Optimizer
lr = .01
Optimizer(lr=lr,
modules=['adam', 'momentum'],
regularizers=['lasso', 'ridge'])
To get to know specifcities about all declearn's modules, please visit declearn webpage.
How to use well-known Federated-Learning algorithms with declearn in Fed-BioMed?
Please refer to the following section of this page.
2. declearn optimizer on Node side
In order to use declearn to optimize Nodes local model, you will have to edit init_otpimizer method in the TrainingPlan. Below we showcase how to use it with PyTorch framework (using Adam and Ridge regularizer for the optimization).
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import AdamModule, RidgeRegularizer
...
class MyTrainingPlan(TorchTrainingPlan):
...
def init_dependencies(self):
deps = [
"from fedbiomed.common.optimizers.optimizer import Optimizer",
"from fedbiomed.common.optimizers.declearn import AdamModule, RidgeRegularizer"
]
return deps
def init_optimizer(self):
return Optimizer(lr=.01, modules=[AdamModule()], regularizers=[RidgeRegularizer()])
Important
You should specify the OptiModules imported in both the imports at the begining of the Training Plan as well as in the dependencies (in the init_dependencies method within the Training Plan). The same holds for declearn's Regularizers.
Syntax will be the same for scikit-learn as shown below, using the same Optimizer:
from fedbiomed.common.training_plans import FedSGDClassifier
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import AdamModule, RidgeRegularizer
...
class MyTrainingPlan(FedSGDClassifier):
...
def init_dependencies(self):
deps = [
"from fedbiomed.common.optimizers.optimizer import Optimizer",
"from fedbiomed.common.optimizers.declearn import AdamModule, RidgeRegularizer"
]
return deps
def init_optimizer(self):
return Optimizer(lr=.01, modules=[AdamModule()], regularizers=[RidgeRegularizer()])
3. declearn optimizer on Researcher side (FedOpt)
Fed-BioMed provides a way to use Adaptive Federated Optimization, introduced as FedOpt in this paper. In the paper, authors considered the difference of the global model weights between 2 successive Rounds as a pseudo gradient, paving the way to the possbility to have Optimizers on Researcher side, optimizing the updates of the global model. To do so, fedbiomed.researcher.federated_workflows.Experiment has a method to set the Researcher Optimizer: Experiment.set_agg_optimizer
Below an example using the set_agg_optimizer with FedYogi:
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators import FedAverage
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
from fedbiomed.common.optimizers.declearn import YogiModule as FedYogi
tags = ['#my-data']
exp = Experiment()
exp.set_training_plan_class(training_plan_class=MyTrainingPlan)
exp.set_tags(tags = tags)
exp.set_aggregator(aggregator=FedAverage())
exp.set_round_limit(2)
exp.set_training_data(training_data=None, from_tags=True)
exp.set_strategy(node_selection_strategy=DefaultStrategy())
# here we are adding an Optimizer on Researcher side (FedYogi)
fed_opt = Optimizer(lr=.8, modules=[FedYogi()])
exp.set_agg_optimizer(fed_opt)
exp.run(increase=True)
Important
You may have noticed that we are using FedAverage in the Experiment configuration, while using YogiModule as an Optimizer. In fact, FedAverage Aggregator in Fed-BioMed refers to the way model weights are aggregated before optimization, and should not be confused with the whole FedAvg algorithm, which is basically a SGD optimizer performed on Node side using FedAvg Aggregtor on Researcher side.
One can also pass directly the agg_optimizer in the Experiment object constructor:
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators import FedAverage
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
from fedbiomed.common.optimizers.declearn import YogiModule as FedYogi
tags = ['#my-data']
fed_opt = Optimizer(lr=.8, modules=[FedYogi()])
exp = Experiment(tags=tags,
training_plan_class=MyTrainingPlan,
round_limit=2,
agg_optimizer=fed_opt,
aggregator=FedAverage(),
node_selection_strategy=None)
exp.run(increase=True)
4. declearn auxiliary variables based Optimizers
In this subsection, we will take a look at some specific Optimizers that are built around auxiliary variables.
4.1. What is an auxiliary variable?
Auxiliary variable is a parameter that is needed for an Optimizer that requieres to be exchanged between Nodes and Researcher, in addition to model parameters. Scaffold is an example of such Optimizer, because built upon correction states, exchanged from Nodes and Researcher.
These Optimizers may come with a specific Researcher version (for Scaffold it is ScaffoldServerModule) and a Node version (resp. ScaffoldClientModule). They may work in a synchronous fashion: Researcher optimizer version may expect auxiliary variables from Node optimizer, and the other way arround (Node optimizer expecting auxiliary variable input from Reseracher optimizer version).
Optimizers using auxiliary variables
Currently only Scaffold (ie ScaffoldClientModule and ScaffoldServerModule) uses auxiliary variables.
4.2. An example using Optimizer with auxiliary variables: Scaffold with declearn
In the last sub-section, we introduced Scaffold. Let's see now how to use it in Fed-BioMed framework.
About native Scaffold implementation in Fed-BioMed
Fed-BioMed provides its own implementation of Scaffold Aggregator, that is only for PyTorch framework. It only works with PyTorch native optimizers torch.optim.Optimizer for the Node Optimizer.
Training Plan design
In the current subsection, we showcase how to edit your Training Plan for PyTorch in order to use Scaffold
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import ScaffoldClientModule
...
class MyTrainingPlan(TorchTrainingPlan):
...
def init_dependencies(self):
deps = [
"from fedbiomed.common.optimizers.optimizer import Optimizer",
"from fedbiomed.common.optimizers.declearn import ScaffoldClientModule",
]
return deps
def init_optimizer(self):
return Optimizer(lr=.01, modules=[ScaffoldClientModule()])
Experiment design
This is how Experiment can be designed (on the Researcher side)
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators import FedAverage
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
from declearn.optimizer.modules import ScaffoldServerModule
tags = ['#my-data']
fed_opt = Optimizer(lr=.8, modules=[ScaffoldServerModule()])
exp = Experiment(tags=tags,
training_plan_class=MyTrainingPlan,
round_limit=2,
agg_optimizer=fed_opt,
aggregator=FedAverage(),
node_selection_strategy=None)
exp.run(increase=True)
Important
You may have noticed that we are using FedAverage in the Experiment configuration, while using ScaffoldServerModule \ ScaffoldClientModule as an Optimizer. In fact, FedAverage Aggregator in Fed-BioMed refers to the way model weights are aggregated before optimization, and should not be confused with the whole FedAvg algorithm, which is basically a SGD optimizer performed on Node side using FedAvg Aggregtor.
Using auxiliary variables with SecAgg
In latest releases of Fed-BioMed, declearn optimizers based on auxiliary variables (like Scaffold optimizer) now have thier auxiliary variables encrypted with SecAgg! Please note that this is not the case with native Scaffold algorithm. For security reason, you may choose to use declearn optimizers over the native ones.
You can find more examples in Advanced Optimizers tutorial
Table to use common Federated Learning algorithm with declearn in Fed-BioMed
Below we have gathered some of the most well known algorithms in Federated Learning in the following table (as a reminder, Node Optimizer must e defined in the TrainingPlan, whereas Researcher Optimizer in the Experiment object):
| Federated Learning Algorithm | Node Optimizer | Researcher Optimizer | Aggregator |
|---|---|---|---|
| AdaAlter (distributed AdaGrad) | AdaGrad Optimizer(lr=xx, modules=[AdaGradModule()]) | None | FedAverage |
| FedAdagrad | SGD Optimizer(lr=xx) | AdaGrad Optimizer(lr=xx, modules=[AdaGradModule()]) | FedAverage |
| FedAdam | SGD Optimizer(lr=xx) | Adam Optimizer(lr=xx, modules=[AdamModule()]) | FedAverage |
| FedAvg | SGD Optimizer(lr=xx) | None | FedAverage |
| FedProx | SGD Optimizer(lr=xx, regularizers=[FedProxRegularizer]) | None | FedAverage |
| FedYogi | SGD Optimizer(lr=xx) | Yogi Optimizer(lr=xx, modules=[YogiModule()]) | FedAverage |
| Scaffold | SGD Optimizer(lr=xx, modules=[ScaffoldClientModule()]) | SGD Optimizer(lr=xx, modules=[ScaffoldServerModule()]) | FedAverage |
5. Common Pitfalls using declearn Optimizers in Fed-BioMed
Below, we are summerizing common pitfalls that may occur when using declearn package in Fed-BioMed:
OptimizationonResearcherside is only possible throughdeclearnOptimizers (and not through native Optimizers such as PyTorch Optimizers);- Some
Optimizersmay requiere some synchronization: it is the case ofScaffoldrelated modules, ieScaffoldClientModuleandScaffoldServerModule; - For the moment
declearnOptimizers that useauxiliary variables(such asScaffold) cannot be protected yet with Secure Aggregation; - For the moment,
declearn'soptimizeronly comes with a unique learning rate (multiple learning ratesOptimizers, for example pytorch optimizerstorch.optim.Optimizercan handle a learning rate per model layer ); - When chaining
declearn'sOptiModules, it is only possible to use a unique learning rate, that will be the same for allOptiModules, and that won't change during aRound; - check for inconcistent Optimizers! Using a
RegularizeronResearcherside may be non-sensical, even if it is doable withindeclearn; ScaffoldFed-BioMedaggregator must not be used when using bothScaffoldServerModuleandScaffoldClientModule. Thisaggregatoris in fact an alternative to thedeclearnscaffold, and you have to choose between theFed-BioMednative version ofScaffoldand thedeclearn's one. Please note thatFed-BioMed aggregator Scaffoldis deprecated, hence, the use ofScaffoldServerModuleandScaffoldClientModuleis highly encouraged.
Conclusion
We have seen how to use declearn Optimizers in Fed-BioMed. In Fed-BioMed, it is possible to set an Optimizer on both the Node and the Researcher side:
-
On
Nodeside, suchOptimizeris defined inTraining Planand is used to optimizeNodes' local models; -
On
Researcherside,Optimizeris defined in theExperiment, and is made for optimizing global model.
When used with declearn package, Fedd-BioMed Aggregator is used for aggregating weights, before any potential optiization: FedAverage does the weighted sum of all local models sent back by the Nodes.
declearn comes with the possibility of chaining Optimizers, by passing a list of OptiModule and Regularizers, making possible to try out some more complex optimization process.
Check the tutorial related to the use of declearn's Optimizers