In Depth Experiment Configuration¶

Introduction¶

The Experiment class provides an interface that you can manage your experiment with backward compatibility. It means that even if your Experiment has been built/defined you will be able to configure its parameters, and allow you to run your notebooks created using previous Fed-BioMed versions (<3.4). This feature will provide more control over your experiment even after you have been running your experiment for several rounds. In this tutorial, the experiment interface will be explained using MNIST basic example.

1. Configuring Fed-BioMed Environment¶

Before running this notebook, you need to configure your environment by completing the following steps:

1.1 Creating the Node component¶

Simply put, a Node can be created by running:

fedbiomed component create -c node

It will create a folder fbm-node (Node's default name) in the directory where the command has been executed. Ths folder contains all files required to launch a Fed-BioMed Node and is called a component.

1.2. Deploying MNIST Dataset in the Node¶

Please run following command to add MNIST dataset into your Node. This command will deploy MNIST dataset in your default node in the directory where the command is executed.

After running following command, please select data type 2) default, use default tags and select the folder where MNIST dataset will be saved.

fedbiomed node dataset add

1.3. Starting the Node¶

After you have successfully completed previous step, please run following command to start your node.

fedbiomed node start

2. Creating a Training Plan¶

Before declaring an experiment, the training plan that will be used for federated training should be defined. The training plan below is the same training plan that is created in the Basic MNIST tutorial. We recommend you to follow Basic MNIST tutorial on PyTorch Framework to understand following steps.

In [ ]:

  Copied!     
 
import torch
import torch.nn as nn
import torch.nn.functional as F
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms


# Here we define the training plan to be used.
# You can use any class name (here 'MyTrainingPlan')
class MyTrainingPlan(TorchTrainingPlan):

    # Defines and return model
    def init_model(self, model_args):
        return self.Net(model_args = model_args)

    # Defines and return optimizer
    def init_optimizer(self, optimizer_args):
        return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"])

    # Declares and return dependencies
    def init_dependencies(self):
        deps = ["from torchvision import datasets, transforms"]
        return deps

    class Net(nn.Module):
        def __init__(self, model_args):
            super().__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout(0.25)
            self.dropout2 = nn.Dropout(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout2(x)
            x = self.fc2(x)


            output = F.log_softmax(x, dim=1)
            return output

    def training_data(self):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        loader_arguments = { 'shuffle': True}
        return DataManager(dataset=dataset1, **loader_arguments)

    def training_step(self, data, target):
        output = self.model().forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss
import torch import torch.nn as nn import torch.nn.functional as F from fedbiomed.common.training_plans import TorchTrainingPlan from fedbiomed.common.data import DataManager from torchvision import datasets, transforms # Here we define the training plan to be used. # You can use any class name (here 'MyTrainingPlan') class MyTrainingPlan(TorchTrainingPlan): # Defines and return model def init_model(self, model_args): return self.Net(model_args = model_args) # Defines and return optimizer def init_optimizer(self, optimizer_args): return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"]) # Declares and return dependencies def init_dependencies(self): deps = ["from torchvision import datasets, transforms"] return deps class Net(nn.Module): def __init__(self, model_args): super().__init__() self.conv1 = nn.Conv2d(1, 32, 3, 1) self.conv2 = nn.Conv2d(32, 64, 3, 1) self.dropout1 = nn.Dropout(0.25) self.dropout2 = nn.Dropout(0.5) self.fc1 = nn.Linear(9216, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = self.conv2(x) x = F.relu(x) x = F.max_pool2d(x, 2) x = self.dropout1(x) x = torch.flatten(x, 1) x = self.fc1(x) x = F.relu(x) x = self.dropout2(x) x = self.fc2(x) output = F.log_softmax(x, dim=1) return output def training_data(self): # Custom torch Dataloader for MNIST data transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]) dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform) loader_arguments = { 'shuffle': True} return DataManager(dataset=dataset1, **loader_arguments) def training_step(self, data, target): output = self.model().forward(data) loss = torch.nn.functional.nll_loss(output, target) return loss

After running the cells above, your training plan class will be ready, and it will be declared in the experiment as training plan which going to be sent to the nodes to perform federated training.

3. Creating an Experiment Step by Step¶

The experiment class can be created without passing any argument. This will just build an empty experiment object. Afterwards, you will be able to define your arguments using setters provided by Experiment class.

It is always possible to create a fully configured experiment by passing all arguments during the initialization. You can also create your experiment with some arguments and set the other arguments afterwards.

3.1. Building an Empty Experiment¶

After building an empty experiment you won't be able to perform federated training, since it is not fully configured. That's why the output of the initialization of Experiment will always remind you that the experiment is not fully configured.

In [ ]:

  Copied!     
 
from fedbiomed.researcher.federated_workflows import Experiment
exp = Experiment()
from fedbiomed.researcher.federated_workflows import Experiment exp = Experiment()

3.2. Displaying Current Status of Experiment¶

As an addition to output of the initialization, to find out more about the current status of the experiment, you can call the info() method of your experiment object. This method will print the information about your experiment and what you should complete to be able to start your federated training.

In [ ]:

  Copied!     
 
exp.info()
exp.info()

Based on the output, some arguments are defined with default values, while others are not. Model arguments, training arguments, tags, round limit, training data etc. have no default value, and therefore are required to be set in order to run an experiment. However, these arguments are related to each other. For example, to be able to define your federated training data you need to define the tags first, and then while setting your training data argument, experiment will be able to send search request to the nodes to receive information about the datasets. These relations between the arguments will be explained in the following steps.

3.3. Setting Training Plan for The Experiment¶

The training plan that is going to be used for the experiment can be set using the method set_training_plan_class.

In [ ]:

  Copied!     
 
exp.set_training_plan_class(training_plan_class=MyTrainingPlan)
exp.set_training_plan_class(training_plan_class=MyTrainingPlan)

If you set your training plan path first, setter will log a debug message which will inform you about the training plan is not defined yet. This is because the training plan class has not been set yet

3.4. Setting Model and Training Arguments¶

In the previous step, the training plan has been defined for the experiment. Now, you can define your model arguments and training arguments that will be used respectively for building your model class and training your model on the node side. The methods set_model_args and set_training_args of the experiment class will allow you to set these arguments.

There isn't any requirement on the order of defining training plan class and mode/training arguments. It is also possible to define model/training arguments first and training plan class after.

In [ ]:

  Copied!     
 
# Model arguments should be an empty Dict, since our model does not require 
# any argument for initialization
model_args = {}

# Training Arguments
training_args = {
    'loader_args': { 'batch_size': 48, },
    'optimizer_args': {
        'lr': 1e-3
    },
    'epochs': 1, 
    'test_ratio': 0.2,
 'test_batch_size': 256,
 'test_on_local_updates': True,
 'test_on_global_updates': True,
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

exp.set_model_args(model_args=model_args)
exp.set_training_args(training_args=training_args)
# Model arguments should be an empty Dict, since our model does not require # any argument for initialization model_args = {} # Training Arguments training_args = { 'loader_args': { 'batch_size': 48, }, 'optimizer_args': { 'lr': 1e-3 }, 'epochs': 1, 'test_ratio': 0.2, 'test_batch_size': 256, 'test_on_local_updates': True, 'test_on_global_updates': True, 'dry_run': False, 'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples } exp.set_model_args(model_args=model_args) exp.set_training_args(training_args=training_args)

3.5. Setting Tags¶

The tags for the dataset search request can be set using set_tags method of experiment object.

Setting tags does not mean sending dataset search request. Search request is sent while setting training data. tags is the argument that is required for the search request.

The arguments tags of set_tags method should be an array of tags which are in string type or just a tag in string type.

In [ ]:

  Copied!     
 
tags = ['#MNIST', '#dataset']
exp.set_tags(tags = tags)
tags = ['#MNIST', '#dataset'] exp.set_tags(tags = tags)

To see the tags that are set, you can run tags() method of experiment object.

In [ ]:

  Copied!     
 
exp.tags()
exp.tags()

3.6. Setting Nodes¶

The nodes arguments indicates the nodes that are going to be used for the experiment. By default, it is equal to None which means every nodes up and running will be part of the experiment as long as they have the dataset that is going to be used for training (and that has been registered under the tags). If the nodes argument has been set in advance when configuring Experiment, the search request for the dataset search will be sent only to nodes that have been indicated. You can set nodes using the method exp.set_nodes(noes=nodes). This method takes nodes argument which should be an array of node ids which are of type string or just a single node id passed as a string.

Since each node id is created randomly to the node when they are configured, we won't be setting nodes for this experiment, so it is possible to run this notebook regardless of the environment.

3.7. Setting Training Data¶

Training data is a FederatedDataset instance which comes from the module fedbiomed.researcher.datasets. There are several ways to define your training data.

You can run set_training_data(training_data=None, from_tags=True). This will send search request to the nodes to get dataset information by using the tags which are defined before.
You can provide training_data argument which is an instance of FederatedDataSet.
You can provide training_data argument as python dictionary dict and setter will create a FederatedDataSet object by itself.

While using the last option please make sure that your dict object is configured accordingly to FederatedDataSet schema. Otherwise, you might get error while running your experiment.

A FederatedDataSet object must have one unique dataset per node to ensure training uses only one dataset for each node. This is checked and enforced when creating a FederatedDataSet

If you run set_training_data(training_data=None), this means that no training data is defined yet for the experiment (training_data is set to None).

In [ ]:

  Copied!     
 
training_data = exp.set_training_data(training_data=None, from_tags=True)
training_data = exp.set_training_data(training_data=None, from_tags=True)

Since the training data setter will send search request to the nodes, the output will inform you about selected nodes for training. It means that those nodes have the dataset, and they will be able to train your model defined in the training plan class.

set_training_data will return a FederatedDataSet object. You can either use the return value of the setter or the getter for training data which is training_data().

In [ ]:

  Copied!     
 
training_data = exp.training_data()
training_data = exp.training_data()

To inspect the result in detail you can call the method data() of the FederatedDataSet object. This will return a python dictionary that includes information about the datasets that has been found in the nodes.

In [ ]:

  Copied!     
 
training_data.data()
training_data.data()

As it is mentioned before, setting training data once doesn't mean that you can't change it, for you can create a new FederatedDataSet with a dict that includes the information about the datasets. This will allow you to select the datasets that will be used for federated training.

Since the dataset information will be provided, there will be no need to send request to the nodes

In [ ]:

  Copied!     
 
from fedbiomed.researcher.datasets import FederatedDataSet 

tr_data = training_data.data()
tr_data = training_data.data()
federated_dataset = FederatedDataSet(tr_data)
exp.set_training_data(training_data = federated_dataset)
from fedbiomed.researcher.datasets import FederatedDataSet tr_data = training_data.data() tr_data = training_data.data() federated_dataset = FederatedDataSet(tr_data) exp.set_training_data(training_data = federated_dataset)

Or, you can directly use tr_data in set_training_data()

In [ ]:

  Copied!     
 
exp.set_training_data(training_data = tr_data)
exp.set_training_data(training_data = tr_data)

If you change the tags for the dataset by using set_tags and if there is already a defined training data in your experiment object, you have to update your training data by running exp.set_training_data(training_data=None).

3.8. Setting an Aggregator¶

An aggregator is one of the required arguments for the experiment. It is used for aggregating model parameters that are received from the nodes after every round (ie once training is done on each node). By default, when the experiment is initialized without passing any aggregator, it will automatically use the default FedAverage aggregator class. However, it is also possible to set a different aggregation algorithm with the method set_aggregator. Currently, Fed-BioMed has only FedAverage but it is possible to create custom aggregator classes.

You can get the current aggregator by running exp.aggregator(). It will return the aggregator object that will be used for aggregation.

In [ ]:

  Copied!     
 
exp.aggregator()
exp.aggregator() 

Let's supposed that you have created your own aggregator: then, you can set it as follows:

In [ ]:

  Copied!     
 
from fedbiomed.researcher.aggregators.fedavg import FedAverage
exp.set_aggregator(aggregator=FedAverage())
from fedbiomed.researcher.aggregators.fedavg import FedAverage exp.set_aggregator(aggregator=FedAverage())

3.9. Setting an Optimizer¶

As well as for the Nodes, it is possible to set an Optimizer on Researcher side (ie in the Experiment). Such optimizer will update the global model, that is the model resulting from the Aggregation.

The method set_agg_optimizer can be used to set such optimizer.

Please bear in mind that only declearn based Optimizers can be passed in the Experiment. You can load them through Fed-BioMed (from fedbiomed.common.optimizers.declearn) as shown below:

In [ ]:

  Copied!     
 
from fedbiomed.common.optimizers.optimizer import Optimizer
from fedbiomed.common.optimizers.declearn import AdamModule

lr = .9
fed_opt = Optimizer(lr=lr, modules=[AdamModule()])

exp.set_agg_optimizer(fed_opt)
from fedbiomed.common.optimizers.optimizer import Optimizer from fedbiomed.common.optimizers.declearn import AdamModule lr = .9 fed_opt = Optimizer(lr=lr, modules=[AdamModule()]) exp.set_agg_optimizer(fed_opt)

3.10. Setting Node Selection Strategy¶

Node selection Strategy is also one of the required arguments for the experiment. It is used for selecting nodes before each round of training. Since the strategy will be used for selecting nodes, thus, training data should be already set before setting any strategies. Then, strategy will be able to select for training nodes that are currently available regarding their dataset.

By default, set_strategy(node_selection_strategy=None) will use the default DefaultStrategy strategy. It is the default strategy in Fed-BioMed that selects for the training all the nodes available regardless their datasets. However, it is also possible to set different strategies. Currently, Fed-BioMed only provides DefaultStrategy but you can create your custom strategy classes.

In [ ]:

  Copied!     
 
exp.set_strategy(node_selection_strategy=None)
exp.set_strategy(node_selection_strategy=None)

Or, you can directly pass DefaultStrategy (or any Strategy class) as an argument

In [ ]:

  Copied!     
 
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
exp.set_strategy(node_selection_strategy=DefaultStrategy())

# To make sure the strategy has been set
exp.strategy()
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy exp.set_strategy(node_selection_strategy=DefaultStrategy()) # To make sure the strategy has been set exp.strategy()

3.11. Setting Round Limit¶

round_limit argument is the limit that indicates max number of rounds of the training. By default, it is None and it needs to be set before running your experiment. You can set the round limit with the method set_round_limit. round_limit can be changed after running one or several rounds of training. You can always execute exp.round_limit() to see current round limit.

In [ ]:

  Copied!     
 
exp.set_round_limit(round_limit=2)
exp.round_limit()
exp.set_round_limit(round_limit=2) exp.round_limit()

3.12. Setting validation facility¶

When training a Federated Learning model, model validation can prove useful, especially if you want to get an idea on how well your model performs on data, according to one or several metrics. Fed-BioMed comes with a validation facility, with the possibility to test the model against a selection of data sampled randomly among the dataset of each Nodes.

Validation can be done in two different ways:

At the begining of each Round, just before model training occurs, but after model aggregation: **Test on gloabl updates`;
At the end of each Round, after training the model: Test on local updates.

To use the Fed-BioMed validation facility, you have to activate in your Experiment:

either activate set_test_on_local_updates or/and set_test_on_global_updates
specify a test_ratio, ie a percentage of data from the Node dataset, that will be used for validating the model.

For more details, especially on how to use a specific validation metric, please visit the Fed-BioMed user guide

In [ ]:

  Copied!     
 
exp.set_test_ratio(0.25)
exp.set_test_on_local_updates(True)
exp.set_test_on_global_updates(True)
exp.set_test_ratio(0.25) exp.set_test_on_local_updates(True) exp.set_test_on_global_updates(True)

!!! note "Displaying validation on Tensorboard" It is possible to display results from validation metric into Tensorboard, for each Round. Please visit the Fed-BioMed user guide for more details.

3.13. Controlling Experiment Status Before Starting Training Rounds¶

Now, let's see if our experiment is ready for the training.

In [ ]:

  Copied!     
 
exp.info()
exp.info()

If the experiment is ready, you will see the message that says Experiment can be run now (fully defined) at the bottom of the output. So now, we can run the experiment

4. Running The Experiment¶

As long as info() says that the experiment is fully defined you will be able to run your experiment. Experiment has two methods: run() and run_once() for running training rounds.

run() runs the experiment rounds from current round to round limit. If the round limit is reached it will indicate that the round limit has been reached. However, the method run takes 2 arguments as round and increase.
- round is an integer that indicates number of rounds that are going to be run. If the experiment is at round 0, the round limit is 4, and if you pass round as 3, it will run the experiment only for 3 rounds.
- increase is a boolean that indicates whether round limit should be increased if the given round passes over the round limit. For example, if the current round is 3, the round limit is 4, and the round argument is 2, the experiment will increase round limit to 5
run_once() runs the experiment for single round of training. If the round limit is reached it will indicate that the round limit has been reached. However, if it is executed as run_once(increase=True) when the round limit is reached, it increases the round limit for one round.

4.1. Running the Experiment once¶

In [ ]:

  Copied!     
 
exp.run_once()
exp.run_once()

After running the experiment for once, you can check the current round. It returns 1 which means only one round has been run.

In [ ]:

  Copied!     
 
exp.round_current()
exp.round_current()

Now, let's run the experiment with run_once() again.

In [ ]:

  Copied!     
 
exp.run_once()
exp.run_once()

Since the round limit has been set to 2 the round limit had been reached. If you try to run run() or run_once() the experiment will indicate that the round limit has been reached.

In [ ]:

  Copied!     
 
exp.run_once()
exp.run_once()

In [ ]:

  Copied!     
 
exp.run()
exp.run()

After this point, if you would like to run the experiment you can increase round limit with set_round_limit(round)

In [ ]:

  Copied!     
 
exp.set_round_limit(4)
print('Round Limit    : ' , exp.round_limit())
print('Current Round  : ' , exp.round_current())
exp.set_round_limit(4) print('Round Limit : ' , exp.round_limit()) print('Current Round : ' , exp.round_current())

The round limit of the experiment has been set to 4 and the completed number of rounds is 2. It means if you run the experiment with method run() without passing any argument, it will run the experiment for 2 rounds.

In [ ]:

  Copied!     
 
exp.run()
exp.run()

Let's check the current round status of the experiment.

In [ ]:

  Copied!     
 
print('Round Limit    : ' , exp.round_limit())
print('Current Round  : ' , exp.round_current())
print('Round Limit : ' , exp.round_limit()) print('Current Round : ' , exp.round_current())

Another way to run your experiment if the round limit is reached is by passing rounds argument to the method run(). For example, following cell will run the experiment for 2 more rounds.

In [ ]:

  Copied!     
 
exp.run(rounds=2, increase=True) # increase is True by default
exp.run(rounds=2, increase=True) # increase is True by default

If the argument increase is False, it will not increase the round limit automatically.

In [ ]:

  Copied!     
 
exp.run(rounds=2, increase=False)
exp.run(rounds=2, increase=False)

In [ ]:

  Copied!     
 
print('Round Limit    : ' , exp.round_limit())
print('Current Round  : ' , exp.round_current())
print('Round Limit : ' , exp.round_limit()) print('Current Round : ' , exp.round_current())

It is also possible to increase number of rounds while running the experiment with run_once() by passing increase argument as True

In [ ]:

  Copied!     
 
exp.run_once(increase=True)
exp.run_once(increase=True)

In [ ]:

  Copied!     
 
print('Round Limit    : ' , exp.round_limit())
print('Current Round  : ' , exp.round_current())
print('Round Limit : ' , exp.round_limit()) print('Current Round : ' , exp.round_current())

4.2. Changing Training Arguments for the Next Round¶

The method set_training_args() allows you to change the training arguments even if you've already run your experiment several times. Thanks to the method set_training_args() you will be able to configure your training from one round to another. For example, we can change our batch_size to 64 and batch_maxnum to 50 for the next round.

In [ ]:

  Copied!     
 
# Training Arguments
training_args = {
    'loader_args': { 'batch_size': 64, },
    'optimizer_args': {
        'lr': 1e-3
    },
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 50
}

exp.set_training_args(training_args=training_args)
# Training Arguments training_args = { 'loader_args': { 'batch_size': 64, }, 'optimizer_args': { 'lr': 1e-3 }, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 50 } exp.set_training_args(training_args=training_args)

In [ ]:

  Copied!     
 
exp.run_once(increase=True)
exp.run_once(increase=True)

Conclusions¶

The Experiment class is the interface and the orchestrator of the whole processes behind federated training on the researcher side. It allows you to manage your federated training experiment easily. It has been extended with setter and getter methods to ease its declaration. This also provides more control before, during or after the training rounds. The purpose of the experiment class is to provide a robust interface for end-user to make them able to easily perform their federated training on Fed-BioMed nodes.