In Depth Experiment Configuration¶
Introduction¶
The Experiment class provides an interface that you can manage your experiment with backward compatibility. It means that even if your Experiment has been built/defined you will be able to configure its parameters, and allow you to run your notebooks created using previous Fed-BioMed versions (<3.4). This feature will provide more control over your experiment even after you have been running your experiment for several rounds. In this tutorial, the experiment interface will be explained using MNIST basic example.
1. Configuring Fed-BioMed Environment¶
Before running this notebook, you need to configure your environment by completing the following steps:
1.1. Deploying MNIST Dataset in the Node¶
Please run following command to add MNIST dataset into your Node. This command will deploy MNIST dataset in your default node that will be created in folder fbm-node
located in the directory where the command is executed.
After running following command, please select data type 2) default
, use default tags
and select the folder where MNIST dataset will be saved.
fedbiomed node dataset add
1.2. Starting the Node¶
After you have successfully completed previous step, please run following command to start your node.
fedbiomed node start
2. Creating a Training Plan¶
Before declaring an experiment, the training plan that will be used for federated training should be defined. The training plan below is the same training plan that is created in the Basic MNIST tutorial. We recommend you to follow Basic MNIST tutorial on PyTorch Framework to understand following steps.
import torch
import torch.nn as nn
import torch.nn.functional as F
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms
# Here we define the training plan to be used.
# You can use any class name (here 'MyTrainingPlan')
class MyTrainingPlan(TorchTrainingPlan):
# Defines and return model
def init_model(self, model_args):
return self.Net(model_args = model_args)
# Defines and return optimizer
def init_optimizer(self, optimizer_args):
return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"])
# Declares and return dependencies
def init_dependencies(self):
deps = ["from torchvision import datasets, transforms"]
return deps
class Net(nn.Module):
def __init__(self, model_args):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
def training_data(self):
# Custom torch Dataloader for MNIST data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
loader_arguments = { 'shuffle': True}
return DataManager(dataset=dataset1, **loader_arguments)
def training_step(self, data, target):
output = self.model().forward(data)
loss = torch.nn.functional.nll_loss(output, target)
return loss
After running the cells above, your training plan class will be ready, and it will be declared in the experiment as training plan which going to be sent to the nodes to perform federated training.
3. Creating an Experiment Step by Step¶
The experiment class can be created without passing any argument. This will just build an empty experiment object. Afterwards, you will be able to define your arguments using setters provided by Experiment
class.
It is always possible to create a fully configured experiment by passing all arguments during the initialization. You can also create your experiment with some arguments and set the other arguments afterwards.
3.1. Building an Empty Experiment¶
After building an empty experiment you won't be able to perform federated training, since it is not fully configured. That's why the output of the initialization of Experiment
will always remind you that the experiment is not fully configured.
from fedbiomed.researcher.federated_workflows import Experiment
exp = Experiment()
3.2. Displaying Current Status of Experiment¶
As an addition to output of the initialization, to find out more about the current status of the experiment, you can call the info()
method of your experiment object. This method will print the information about your experiment and what you should complete to be able to start your federated training.
exp.info()
Based on the output, some arguments are defined with default values, while others are not. Model arguments, training arguments, tags, round limit, training data etc. have no default value, and therefore are required to be set in order to run an experiment. However, these arguments are related to each other. For example, to be able to define your federated training data you need to define the tags
first, and then while setting your training data argument, experiment will be able to send search request to the nodes to receive information about the datasets. These relations between the arguments will be explained in the following steps.
3.3. Setting Training Plan for The Experiment¶
The training plan that is going to be used for the experiment can be set using the method set_training_plan_class
.
exp.set_training_plan_class(training_plan_class=MyTrainingPlan)
If you set your training plan path first, setter will log a debug message which will inform you about the training plan is not defined yet. This is because the training plan class has not been set yet
3.4. Setting Model and Training Arguments¶
In the previous step, the training plan has been defined for the experiment. Now, you can define your model arguments and training arguments that will be used respectively for building your model class and training your model on the node side. The methods set_model_args
and set_training_args
of the experiment class will allow you to set these arguments.
There isn't any requirement on the order of defining training plan class and mode/training arguments. It is also possible to define model/training arguments first and training plan class after.
# Model arguments should be an empty Dict, since our model does not require
# any argument for initialization
model_args = {}
# Training Arguments
training_args = {
'loader_args': { 'batch_size': 48, },
'optimizer_args': {
'lr': 1e-3
},
'epochs': 1,
'test_ratio': 0.2,
'test_batch_size': 256,
'test_on_local_updates': True,
'test_on_global_updates': True,
'dry_run': False,
'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
exp.set_model_args(model_args=model_args)
exp.set_training_args(training_args=training_args)
3.5. Setting Tags¶
The tags for the dataset search request can be set using set_tags
method of experiment object.
Setting tags does not mean sending dataset search request. Search request is sent while setting training data. tags
is the argument that is required for the search request.
The arguments tags
of set_tags
method should be an array of tags which are in string
type or just a tag in string
type.
tags = ['#MNIST', '#dataset']
exp.set_tags(tags = tags)
To see the tags that are set, you can run tags()
method of experiment object.
exp.tags()
3.6. Setting Nodes¶
The nodes
arguments indicates the nodes that are going to be used for the experiment. By default, it is equal to None
which means every node up and running will be part of the experiment as long as they have the dataset that is going to be used for training (and that has been registered under the tags). If the nodes
argument has been set in advance when configuring Experiment
, the search request for the dataset search will be sent only to nodes that have been indicated. You can set nodes using the method exp.set_nodes(noes=nodes)
. This method takes nodes
argument which should be an array of node ids which are of type string
or just a single node id passed as a string
.
Since each node id is created randomly to the node when they are configured, we won't be setting nodes
for this experiment, so it is possible to run this notebook regardless of the environment.
3.7. Setting Training Data¶
Training data is a FederatedDataset
instance which comes from the module fedbiomed.researcher.datasets
. There are several ways to define your training data.
- You can run
set_training_data(training_data=None, from_tags=True)
. This will send search request to the nodes to get dataset information by using thetags
which are defined before. - You can provide
training_data
argument which is an instance ofFederatedDataSet
. - You can provide
training_data
argument as python dictionarydict
and setter will create aFederatedDataSet
object by itself.
While using the last option please make sure that your dict
object is configured accordingly to FederatedDataSet
schema. Otherwise, you might get error while running your experiment.
A FederatedDataSet
object must have one unique dataset per node to ensure training uses only one dataset for each node. This is checked and enforced when creating a FederatedDataSet
If you run set_training_data(training_data=None)
, this means that no training data is defined yet for the experiment (training_data
is set to None
).
training_data = exp.set_training_data(training_data=None, from_tags=True)
Since the training data setter will send search request to the nodes, the output will inform you about selected nodes for training. It means that those nodes have the dataset, and they will be able to train your model defined in the training plan class.
set_training_data
will return a FederatedDataSet
object. You can either use the return value of the setter or the getter for training data which is training_data()
.
training_data = exp.training_data()
To inspect the result in detail you can call the method data()
of the FederatedDataSet
object. This will return a python dictionary that includes information about the datasets that has been found in the nodes.
training_data.data()
As it is mentioned before, setting training data once doesn't mean that you can't change it, for you can create a new FederatedDataSet
with a dict
that includes the information about the datasets. This will allow you to select the datasets that will be used for federated training.
Since the dataset information will be provided, there will be no need to send request to the nodes
from fedbiomed.researcher.datasets import FederatedDataSet
tr_data = training_data.data()
tr_data = training_data.data()
federated_dataset = FederatedDataSet(tr_data)
exp.set_training_data(training_data = federated_dataset)
Or, you can directly use tr_data
in set_training_data()
exp.set_training_data(training_data = tr_data)
If you change the tags for the dataset by using set_tags
and if there is already a defined training data in your experiment object, you have to update your training data by running exp.set_training_data(training_data=None)
.
3.8. Setting an Aggregator¶
An aggregator is one of the required arguments for the experiment. It is used for aggregating model parameters that are received from the nodes after every round (ie once training is done on each node). By default, when the experiment is initialized without passing any aggregator, it will automatically use the default FedAverage
aggregator class. However, it is also possible to set a different aggregation algorithm with the method set_aggregator
. Currently, Fed-BioMed has only FedAverage
but it is possible to create custom aggregator classes.
You can get the current aggregator by running exp.aggregator()
. It will return the aggregator object that will be used for aggregation.
exp.aggregator()
Let's supposed that you have created your own aggregator: then, you can set it as follows:
from fedbiomed.researcher.aggregators.fedavg import FedAverage
exp.set_aggregator(aggregator=FedAverage())
3.9. Setting an Optimizer¶
As well as for the Nodes
, it is possible to set an Optimizer
on Researcher
side (ie in the Experiment
). Such optimizer
will update the global model, that is the model resulting from the Aggregation
.
The method set_agg_optimizer
can be used to set such optimizer
.
Please bear in mind that only declearn
based Optimizers
can be passed in the Experiment
. You can load them through Fed-BioMed
(from fedbiomed.common.optimizers.declearn
) as shown below:
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import AdamModule
lr = .9
fed_opt = Optimizer(lr=lr, modules=[AdamModule()])
exp.set_agg_optimizer(fed_opt)
3.10. Setting Node Selection Strategy¶
Node selection Strategy is also one of the required arguments for the experiment. It is used for selecting nodes before each round of training. Since the strategy will be used for selecting nodes, thus, training data should be already set before setting any strategies. Then, strategy will be able to select for training nodes that are currently available regarding their dataset.
By default, set_strategy(node_selection_strategy=None)
will use the default DefaultStrategy
strategy. It is the default strategy in Fed-BioMed that selects for the training all the nodes available regardless their datasets. However, it is also possible to set different strategies. Currently, Fed-BioMed only provides DefaultStrategy
but you can create your custom strategy classes.
exp.set_strategy(node_selection_strategy=None)
Or, you can directly pass DefaultStrategy
(or any Strategy class) as an argument
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
exp.set_strategy(node_selection_strategy=DefaultStrategy())
# To make sure the strategy has been set
exp.strategy()
3.11. Setting Round Limit¶
round_limit
argument is the limit that indicates max number of rounds of the training. By default, it is None
and it needs to be set before running your experiment. You can set the round limit with the method set_round_limit
. round_limit
can be changed after running one or several rounds of training. You can always execute exp.round_limit()
to see current round limit.
exp.set_round_limit(round_limit=2)
exp.round_limit()
3.12. Setting validation facility¶
When training a Federated Learning model, model validation can prove useful, especially if you want to get an idea on how well your model performs on data, according to one or several metrics. Fed-BioMed
comes with a validation facility, with the possibility to test the model against a selection of data sampled randomly among the dataset of each Nodes
.
Validation can be done in two different ways:
- At the begining of each
Round
, just before model training occurs, but after model aggregation: **Test on gloabl updates`; - At the end of each
Round
, after training the model: Test on local updates.
To use the Fed-BioMed
validation facility, you have to activate in your Experiment
:
- either activate
set_test_on_local_updates
or/andset_test_on_global_updates
- specify a
test_ratio
, ie a percentage of data from theNode
dataset, that will be used for validating the model.
For more details, especially on how to use a specific validation metric, please visit the Fed-BioMed user guide
exp.set_test_ratio(0.25)
exp.set_test_on_local_updates(True)
exp.set_test_on_global_updates(True)
!!! note "Displaying validation on Tensorboard" It is possible to display results from validation metric into Tensorboard, for each Round
. Please visit the Fed-BioMed user guide for more details.
3.13. Controlling Experiment Status Before Starting Training Rounds¶
Now, let's see if our experiment is ready for the training.
exp.info()
If the experiment is ready, you will see the message that says Experiment can be run now (fully defined)
at the bottom of the output. So now, we can run the experiment
4. Running The Experiment¶
As long as info()
says that the experiment is fully defined you will be able to run your experiment. Experiment has two methods: run()
and run_once()
for running training rounds.
run()
runs the experiment rounds from current round to round limit. If the round limit is reached it will indicate that the round limit has been reached. However, the methodrun
takes 2 arguments asround
andincrease
.round
is an integer that indicates number of rounds that are going to be run. If the experiment is at round0
, the round limit is4
, and if you passround
as 3, it will run the experiment only for3
rounds.increase
is a boolean that indicates whether round limit should be increased if the givenround
passes over the round limit. For example, if the current round is3
, the round limit is4
, and theround
argument is2
, the experiment will increase round limit to5
run_once()
runs the experiment for single round of training. If the round limit is reached it will indicate that the round limit has been reached. However, if it is executed asrun_once(increase=True)
when the round limit is reached, it increases the round limit for one round.
4.1. Running the Experiment once¶
exp.run_once()
After running the experiment for once, you can check the current round. It returns 1
which means only one round has been run.
exp.round_current()
Now, let's run the experiment with run_once()
again.
exp.run_once()
Since the round limit has been set to 2
the round limit had been reached. If you try to run run()
or run_once()
the experiment will indicate that the round limit has been reached.
exp.run_once()
exp.run()
After this point, if you would like to run the experiment you can increase round limit with set_round_limit(round)
exp.set_round_limit(4)
print('Round Limit : ' , exp.round_limit())
print('Current Round : ' , exp.round_current())
The round limit of the experiment has been set to 4
and the completed number of rounds is 2
. It means if you run the experiment with method run()
without passing any argument, it will run the experiment for 2
rounds.
exp.run()
Let's check the current round status of the experiment.
print('Round Limit : ' , exp.round_limit())
print('Current Round : ' , exp.round_current())
Another way to run your experiment if the round limit is reached is by passing rounds
argument to the method run()
. For example, following cell will run the experiment for 2
more rounds.
exp.run(rounds=2, increase=True) # increase is True by default
If the argument increase
is False
, it will not increase the round limit automatically.
exp.run(rounds=2, increase=False)
print('Round Limit : ' , exp.round_limit())
print('Current Round : ' , exp.round_current())
It is also possible to increase number of rounds while running the experiment with run_once()
by passing increase
argument as True
exp.run_once(increase=True)
print('Round Limit : ' , exp.round_limit())
print('Current Round : ' , exp.round_current())
4.2. Changing Training Arguments for the Next Round¶
The method set_training_args()
allows you to change the training arguments even if you've already run your experiment several times. Thanks to the method set_training_args()
you will be able to configure your training from one round to another. For example, we can change our batch_size
to 64
and batch_maxnum
to 50
for the next round.
# Training Arguments
training_args = {
'loader_args': { 'batch_size': 64, },
'optimizer_args': {
'lr': 1e-3
},
'epochs': 1,
'dry_run': False,
'batch_maxnum': 50
}
exp.set_training_args(training_args=training_args)
exp.run_once(increase=True)
Conclusions¶
The Experiment
class is the interface and the orchestrator of the whole processes behind federated training on the researcher side. It allows you to manage your federated training experiment easily. It has been extended with setter and getter methods to ease its declaration. This also provides more control before, during or after the training rounds. The purpose of the experiment class is to provide a robust interface for end-user to make them able to easily perform their federated training on Fed-BioMed nodes.