Advanced Optimization in Fed-BioMed
Advanced Optimization can be done in Fed-BioMed
through the use of declearn
, a Python package that provides gradient-based Optimizers
. declearn
is cross-machine learning framework, meaning that it can be used with most machine learning frameworks (scikit-learn, PyTorch, Tensorflow, JAX, ...).
The following chapter explores in depth how to use declearn
optimization feature in Fed-BioMed
. For an example, please refer to the Advanced Optimizer tutorial.
1. Introduction to Declearn
based Optimizer: a cross framework Optimizer
library
1.1. What is declearn
package?
declearn
package is another Federated Learning framework modular and combinable, providing state-of-the-art gradient-based Optimizer
algorithms. In Fed-BioMed
, we are only using its Optimization
facility, leaving aside all other components of declearn
that we don't use in Fed-BioMed
.
References: For further details about declearn
, you may visit:
1.2. declearn
interface in Fed-BioMed
: the Optimizer
object
In Fed-BioMed
, we provide a Optimizer
object, that works as an interface with declearn
, and was made in order to use declearn
's Optimizers (see below declearn
's OptiModules
and declearn
's Regularizers
).
from fedbiomed.common.optimizers import Optimizer
Optimizer(lr=.1, decay=.0, modules=[], regularizers=[])
with the following arguments:
-
lr
: the learning rate; -
decay
: the weight decay; -
modules
: a list ofdeclearn
'sOptiModules
(or a list ofOptiModules' names
); -
regularizers
: a list ofdeclearn
'sRegularizers
(or a list ofRegularizers' names
).
1.3. declearn
's OptiModules
declearn
OptiModules
are modules that convey Optimizers
, which purpose is to optimize a loss function (that can be written using a PyTorch loss function or defined in a scikit learn model) in order to optimize a model. Compatible declearn
OptiModules
with Fed-BioMed framework are defined in fedbiomed.common.optimizers.declearn
module. They should be imported from fedbiomed.common.optimizers.declearn
, as shown in the examples below. You can also import them direclty from declearn
's declearn.optimizer.modules
, but they will be no guarentees it is compatible with Fed-BioMed. In fact, recommended method is importing modules through fedbiomed.common.optimizers.declearn
.
Usage:
-
For basic SGD (Stochastic Gradient Descent), we don't need to specify a
declearn
OptiModule
and/or aRegularizer
from fedbiomed.common.optimizers.optimizer import Optimizer lr = .01 Optimizer(lr=lr)
-
For a specfic Optimizer like Adam, we need to import
AdamModule
fromdeclearn
. Hence, it yields:from fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import AdamModule lr = .01 Optimizer(lr=lr, modules=[AdamModule()])
-
It is possible to chain
Optimizer
with severalOptiModules
, meaning to use severalOptimizers
. Some chains ofOptiModule
may be non-sensical, so use it at your own risk! Below we showcase the use of Adam with Momentumfrom fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import AdamModule, MomentumModule lr = .01 Optimizer(lr=lr, modules=[AdamModule(), MomentumModule()])
-
To get all comptible
OptiModule
in Fed-BioMed, one can run the [list_optim_modules
][fedbiomed.common.optimizers.declearn.list_optim_modules].from fedbiomed.common.optimizers.declearn import list_optim_modules list_optim_modules()
For further information on declearn OptiModule
, please visit declearn OptiModule
and declearn
's Optimizers documentation.
List of available Optimizers provided by declearn
To get a list of all available Optimizers in declearn, please enter (after having loaded Fed-BioMed
conda environment)
from declearn.optimizer import list_optim_modules
list_optim_modules()
1.4. declearn
's Regularizers
declearn
's Regularizers
are objects that enable the use of Regularizer
, which add an additional term to the loss function one wants to Optimize through the use of optimizer. It mainly helps to get a more generalizable model, and prevents overfitting.
The optimization equation yields to:
with
\(\theta_t : \textrm{model weights at update } t\)
\(\eta : \textrm{learning rate}\)
\(\alpha : \textrm{regularization coefficient}\)
\(f_{x,y}: \textrm{Loss function used for optimizing the model}\)
Regularizers
should be used and combined with an Optimizer. For instance, SGD with Ridge regression, or Adam with Lasso regression. FedProx
is also considered as a regularization.
Optimizer without OptiModules
When no OptiModules
are specified in the modules
argument of Optimizer
, plain SGD algorithm is set by default for the optimization.
Usage:
-
for example, SGD with Ridge regression will be written as:
from fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import RidgeRegularizer lr = .01 Optimizer(lr=lr, regularizers=[RidgeRegularizer()])
-
Adam with Lasso Regression:
from fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import AdamModule from fedbiomed.common.optimizers.declearn import LassoRegularizer lr = .01 Optimizer(lr=lr, modules=[AdamModule()], regularizers=[LassoRegularizer()])
-
Chaining several Regularizers: an example with Ridge and Lasso regularizers, and Adam with momentum as the Optimizer.
from fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import LassoRegularizer, RidgeRegularizer lr = .01 Optimizer(lr=lr, modules=[AdamModule(), MomentumModule()], regularizers=[LassoRegularizer(), RidgeRegularizer()])
For further information on declearn Regularizer
, please visit declearn Regularizers
documentation webpage
1.5. Chaining Optimizers
and Regularizers
with declearn
modules
It is possible in declearn
to chain several OptiModules
and Regularizers
in an Optimizer. Generaly speaking, Optimizer
in declearn
can be written as:
with
\(Opt : \textrm{an OptiModule}\)
\(Reg : \textrm{a Regularizer}\)
\(\theta : \textrm{model weight}\)
\(\eta : \textrm{learning rate}\)
\(\tau : \textrm{weight decay}\)
\(f_{x,y}: \textrm{Loss function used for optimizing the model}\)
The above holds for a single Regularizer
and OptiModule
. When using (ie chaining) several OptiModules
and Regularizers
in an Optimizer
, the above optimization equation becomes:
where
\(Opt_{1\le i \le n}: \textrm{ OptiModules, with } n \textrm{ the total number of OptiModules used}\)
\(Reg_{1\le i \le m}: \textrm{ Regularizers, with } m \textrm{ the total number of Regularizers used}\)
Example: let's write an Optimizer
using RMSProp
and Momentum
OptiModules
, and both Lasso and Ridge Regularizers.
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import RMSPropModule, MomentumModule, LassoRegularizer, RidgeRegularizer
lr = .01
Optimizer(lr=lr,
modules=[RMSPropModule(), MomentumModule()],
regularizers=[LassoRegularizer(), RidgeRegularizer()])
Using list of strings instead of list of modules
In declearn
, it is possible to use name of modules instead of loading the actual modules. In the script below, we are rewritting the same Optimizer
as the one above but by specifying the module names. A convinient way to get the naming is to use list_optim_modules
and list_optim_regularizers
functions, that map module names with their classes respectively.
from fedbiomed.common.optimizers.optimizer import Optimizer
lr = .01
Optimizer(lr=lr,
modules=['adam', 'momentum'],
regularizers=['lasso', 'ridge'])
To get to know specifcities about all declearn
's modules, please visit declearn
webpage.
How to use well-known Federated-Learning algorithms with declearn
in Fed-BioMed
?
Please refer to the following section of this page.
2. declearn
optimizer on Node side
In order to use declearn
to optimize Node
s local model, you will have to edit init_otpimizer
method in the TrainingPlan
. Below we showcase how to use it with PyTorch framework (using Adam and Ridge regularizer for the optimization).
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import AdamModule, RidgeRegularizer
...
class MyTrainingPlan(TorchTrainingPlan):
...
def init_dependencies(self):
deps = [
"from fedbiomed.common.optimizers.optimizer import Optimizer",
"from fedbiomed.common.optimizers.declearn import AdamModule, RidgeRegularizer"
]
return deps
def init_optimizer(self):
return Optimizer(lr=.01, modules=[AdamModule()], regularizers=[RidgeRegularizer()])
Important
You should specify the OptiModules
imported in both the imports at the begining of the Training Plan
as well as in the dependencies (in the init_dependencies
method within the Training Plan
). The same holds for declearn
's Regularizers
.
Syntax will be the same for scikit-learn as shown below, using the same Optimizer
:
from fedbiomed.common.training_plans import FedSGDClassifier
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import AdamModule, RidgeRegularizer
...
class MyTrainingPlan(FedSGDClassifier):
...
def init_dependencies(self):
deps = [
"from fedbiomed.common.optimizers.optimizer import Optimizer",
"from fedbiomed.common.optimizers.declearn import AdamModule, RidgeRegularizer"
]
return deps
def init_optimizer(self):
return Optimizer(lr=.01, modules=[AdamModule()], regularizers=[RidgeRegularizer()])
3. declearn
optimizer on Researcher side (FedOpt
)
Fed-BioMed
provides a way to use Adaptive Federated Optimization, introduced as FedOpt
in this paper. In the paper, authors considered the difference of the global model weights between 2 successive Rounds
as a pseudo gradient, paving the way to the possbility to have Optimizers
on Researcher
side, optimizing the updates of the global model. To do so, fedbiomed.researcher.federated_workflows.Experiment
has a method to set the Researcher Optimizer
: Experiment.set_agg_optimizer
Below an example using the set_agg_optimizer
with FedYogi
:
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators import FedAverage
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
from fedbiomed.common.optimizers.declearn import YogiModule as FedYogi
tags = ['#my-data']
exp = Experiment()
exp.set_training_plan_class(training_plan_class=MyTrainingPlan)
exp.set_tags(tags = tags)
exp.set_aggregator(aggregator=FedAverage())
exp.set_round_limit(2)
exp.set_training_data(training_data=None, from_tags=True)
exp.set_strategy(node_selection_strategy=DefaultStrategy())
# here we are adding an Optimizer on Researcher side (FedYogi)
fed_opt = Optimizer(lr=.8, modules=[FedYogi()])
exp.set_agg_optimizer(fed_opt)
exp.run(increase=True)
Important
You may have noticed that we are using FedAverage
in the Experiment
configuration, while using YogiModule
as an Optimizer
. In fact, FedAverage
Aggregator
in Fed-BioMed
refers to the way model weights are aggregated before optimization, and should not be confused with the whole FedAvg
algorithm, which is basically a SGD optimizer performed on Node
side using FedAvg
Aggregtor
on Researcher
side.
One can also pass directly the agg_optimizer
in the Experiment
object constructor:
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators import FedAverage
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
from fedbiomed.common.optimizers.declearn import YogiModule as FedYogi
tags = ['#my-data']
fed_opt = Optimizer(lr=.8, modules=[FedYogi()])
exp = Experiment(tags=tags,
training_plan_class=MyTrainingPlan,
round_limit=2,
agg_optimizer=fed_opt,
aggregator=FedAverage(),
node_selection_strategy=None)
exp.run(increase=True)
4. declearn
auxiliary variables based Optimizers
In this subsection, we will take a look at some specific Optimizers
that are built around auxiliary variables
.
4.1. What is an auxiliary variable?
Auxiliary variable
is a parameter that is needed for an Optimizer
that requieres to be exchanged between Nodes
and Researcher
, in addition to model parameters. Scaffold
is an example of such Optimizer
, because built upon correction states, exchanged from Nodes
and Researcher
.
These Optimizers
may come with a specific Researcher
version (for Scaffold
it is ScaffoldServerModule
) and a Node
version (resp. ScaffoldClientModule
). They may work in a synchronous fashion: Researcher
optimizer version may expect auxiliary variables from Node
optimizer, and the other way arround (Node
optimizer expecting auxiliary variable input from Reseracher
optimizer version).
4.2. An example using Optimizer
with auxiliary variables: Scaffold
with declearn
In the last sub-section, we introduced Scaffold
. Let's see now how to use it in Fed-BioMed
framework.
About native Scaffold implementation in Fed-BioMed
Fed-BioMed
provides its own implementation of Scaffold
Aggregator
, that is only for PyTorch framework. It only works with PyTorch native optimizers torch.optim.Optimizer
for the Node Optimizer
.
Training Plan
design
In the current subsection, we showcase how to edit your Training Plan
for PyTorch in order to use Scaffold
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import ScaffoldClientModule
...
class MyTrainingPlan(TorchTrainingPlan):
...
def init_dependencies(self):
deps = [
"from fedbiomed.common.optimizers.optimizer import Optimizer",
"from fedbiomed.common.optimizers.declearn import ScaffoldClientModule",
]
return deps
def init_optimizer(self):
return Optimizer(lr=.01, modules=[ScaffoldClientModule()])
Experiment
design
This is how Experiment
can be designed (on the Researcher
side)
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators import FedAverage
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
from declearn.optimizer.modules import ScaffoldServerModule
tags = ['#my-data']
fed_opt = Optimizer(lr=.8, modules=[ScaffoldServerModule()])
exp = Experiment(tags=tags,
training_plan_class=MyTrainingPlan,
round_limit=2,
agg_optimizer=fed_opt,
aggregator=FedAverage(),
node_selection_strategy=None)
exp.run(increase=True)
Important
You may have noticed that we are using FedAverage
in the Experiment
configuration, while using ScaffoldServerModule
\ ScaffoldClientModule
as an Optimizer
. In fact, FedAverage
Aggregator
in Fed-BioMed
refers to the way model weights are aggregated before optimization, and should not be confused with the whole FedAvg
algorithm, which is basically a SGD optimizer performed on Node
side using FedAvg
Aggregtor
.
Security issues using auxiliary variables when using SecAgg
Currently, declearn
optimizers based on auxiliary variables (like Scaffold
), do not have their auxiliary variables protected by SecAgg
secure aggregation mechanism yet. This is something that will be changed in future Fed-BioMed
releases.
You can find more examples in Advanced Optimizers tutorial
Table to use common Federated Learning algorithm with declearn
in Fed-BioMed
Below we have gathered some of the most well known algorithms in Federated Learning in the following table (as a reminder, Node Optimizer
must e defined in the TrainingPlan
, whereas Researcher Optimizer
in the Experiment
object):
Federated Learning Algorithm | Node Optimizer | Researcher Optimizer | Aggregator |
---|---|---|---|
AdaAlter (distributed AdaGrad) | AdaGrad Optimizer(lr=xx, modules=[AdaGradModule()]) | None | FedAverage |
FedAdagrad | SGD Optimizer(lr=xx) | AdaGrad Optimizer(lr=xx, modules=[AdaGradModule()]) | FedAverage |
FedAdam | SGD Optimizer(lr=xx) | Adam Optimizer(lr=xx, modules=[AdamModule()]) | FedAverage |
FedAvg | SGD Optimizer(lr=xx) | None | FedAverage |
FedProx | SGD Optimizer(lr=xx, regularizers=[FedProxRegularizer]) | None | FedAverage |
FedYogi | SGD Optimizer(lr=xx) | Yogi Optimizer(lr=xx, modules=[YogiModule()]) | FedAverage |
Scaffold | SGD Optimizer(lr=xx, modules=[ScaffoldClientModule()]) | SGD Optimizer(lr=xx, modules=[ScaffoldServerModule()]) | FedAverage |
5. Common Pitfalls using declearn
Optimizers in Fed-BioMed
Below, we are summerizing common pitfalls that may occur when using declearn
package in Fed-BioMed
:
Optimization
onResearcher
side is only possible throughdeclearn
Optimizers (and not through native Optimizers such as PyTorch Optimizers);- Some
Optimizers
may requiere some synchronization: it is the case ofScaffold
related modules, ieScaffoldClientModule
andScaffoldServerModule
; - For the moment
declearn
Optimizers that useauxiliary variables
(such asScaffold
) cannot be protected yet with Secure Aggregation; - For the moment,
declearn
'soptimizer
only comes with a unique learning rate (multiple learning ratesOptimizers
, for example pytorch optimizerstorch.optim.Optimizer
can handle a learning rate per model layer ); - When chaining
declearn
'sOptiModules
, it is only possible to use a unique learning rate, that will be the same for allOptiModules
, and that won't change during aRound
; - check for inconcistent Optimizers! Using a
Regularizer
onResearcher
side may be non-sensical, even if it is doable withindeclearn
; Scaffold
Fed-BioMed
aggregator must not be used when using bothScaffoldServerModule
andScaffoldClientModule
. Thisaggregator
is in fact an alternative to thedeclearn
scaffold
, and you have to choose between theFed-BioMed
native version ofScaffold
and thedeclearn
's one. Please note thatFed-BioMed aggregator Scaffold
is deprecated, hence, the use ofScaffoldServerModule
andScaffoldClientModule
is highly encouraged.
Conclusion
We have seen how to use declearn
Optimizers
in Fed-BioMed
. In Fed-BioMed
, it is possible to set an Optimizer
on both the Node
and the Researcher
side:
-
On
Node
side, suchOptimizer
is defined inTraining Plan
and is used to optimizeNodes
' local models; -
On
Researcher
side,Optimizer
is defined in theExperiment
, and is made for optimizing global model.
When used with declearn
package, Fedd-BioMed
Aggregator
is used for aggregating weights, before any potential optiization: FedAverage
does the weighted sum of all local models sent back by the Nodes
.
declearn
comes with the possibility of chaining Optimizers
, by passing a list of OptiModule
and Regularizers
, making possible to try out some more complex optimization process.
Check the tutorial related to the use of declearn
's Optimizers