Training with Secure Aggregation¶
Secure aggregation is one of the security feature that is provided by Fed-BioMed. Please refer to secure aggregation user guide for more information regarding the methods and techniques that are used. This tutorial gives an example of secure aggregation usage in Fed-BioMed.
Setting up the nodes¶
During the tutorial, nodes and researcher will be launched locally using single clone of Fed-BioMed. However, it is also possible to execute notebook cells when the components are configured remotely by respecting following instruction.
Start the network¶
Before running this notebook, start the network with ./scripts/fedbiomed_run network
Configuring/Installing Element for Secure Aggregation¶
You can follow the detailed instructions for configuring Fed-BioMed instance for secure aggregation or apply following shortened instructions for a basic setup.
1. Install and configure¶
Fed-BioMed uses MP-SPDZ for MPC. Therefore, please make sure that MP-SPDZ are installed and configured for Fed-BioMed by running following command.
${FEDBIOMED_DIR}/scripts/fedbiomed_configure_secagg node
Since node and researcher will be run in the same machine, single configuration for MP-SDPZ will enouhg
2. Create node and researcher instances¶
The setup for secure aggregation requires knowledge of the participating Fed-BioMed components in advance. Therefore, each component that will participate in the training should be created before starting them. Afterwards, participating components can be registered in every other component.
2.1¶
It is mandatory to have at least two nodes for the experiment that requires secure aggregation. Please execute following commands to create two nodes.
Node 1:
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config config-n1.ini configuration create
Node 2:
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config config-n2.ini configuration create
2.2 Create researcher¶
Please run the command below to create researcher component.
${FEDBIOMED_DIR}/scripts/fedbiomed_run researcher configuration create
3. Registering participating Fed-BioMed instances¶
Normally, as it is mentioned in secure aggregation configuration each participating instance should register network credentials of others such as IP, port and SSL certificate. however, since this example will be run on single clone of Fed-BioMed, registration process can be done automaticity by running following command.
${FEDBIOMED_DIR}/scripts/fedbiomed_run certicate-dev-setup
4. Add dataset and start nodes¶
The next step will be adding/deploying MNIST dataset in the nodes and starting them. For these step you can follow the instructions for adding dataset into nodes to add MNIST dataset. After the datasets are deployed you can start the nodes and researcher.
Define an experiment model and parameters"¶
Declare a torch training plan MyTrainingPlan class to send for training on the node
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms
# Here we define the model to be used.
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
# Defines and return model
def init_model(self, model_args):
return self.Net(model_args = model_args)
# Defines and return optimizer
def init_optimizer(self, optimizer_args):
return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"])
# Declares and return dependencies
def init_dependencies(self):
deps = ["from torchvision import datasets, transforms"]
return deps
class Net(nn.Module):
def __init__(self, model_args):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
def training_data(self):
# Custom torch Dataloader for MNIST data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
train_kwargs = { 'shuffle': True}
return DataManager(dataset=dataset1, **train_kwargs)
def training_step(self, data, target):
output = self.model().forward(data)
loss = torch.nn.functional.nll_loss(output, target)
return loss
This group of arguments correspond respectively:
model_args
: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.training_args
: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.
NOTE: typos and/or lack of positional (required) arguments will raise error. 🤓
model_args = {}
training_args = {
'loader_args': { 'batch_size': 48, },
'optimizer_args': {
"lr" : 1e-3
},
'epochs': 1,
'dry_run': False,
'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
Declare and run the experiment¶
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage
from fedbiomed.researcher.secagg import SecureAggregation
tags = ['#MNIST', '#dataset']
rounds = 2
exp = Experiment(tags=tags,
model_args=model_args,
training_plan_class=MyTrainingPlan,
training_args=training_args,
round_limit=rounds,
aggregator=FedAverage(),
node_selection_strategy=None,
secagg=True, # or custom SecureAggregation(active=<bool>, clipping_range=<int>, timeout=<int>)
save_breakpoints=True)
Access secure aggregation context¶
Please use the attribute secagg
to verify secure aggregation is set as active
print("Is using secagg: ", exp.secagg.active)
It is also possible to check secure aggregation context using secagg
attribute. Since secure aggregation context negotiation will occur during experiment run, context and id should be None
at this point.
print("Secagg Biprime ", exp.secagg.biprime)
print("Secagg Servkey ", exp.secagg.servkey)
Run the experiment, using secure aggregation. Secure aggregation context will be created before the first training round, and it is going to be updated before each round when new nodes are added or removed to the experiment.
exp.run(increase=True)
Display context after running one round of training.
print("Secagg Biprime context: ", exp.secagg.biprime.context)
print("Secagg Servkey context: ", exp.secagg.servkey.context)
Changes in experiment triggers re-creation of secure aggregation context¶
The changes that re-create jobs like adding new node to the experiment will trigger automatic secure aggregation re-setup for the next round.
# sends new dataset search request
from fedbiomed.researcher.strategies import DefaultStrategy
from fedbiomed.researcher.aggregators.fedavg import FedAverage
exp.set_training_data(None, True)
exp.set_strategy(DefaultStrategy)
exp.set_aggregator(FedAverage)
exp.set_job()
exp.run_once(increase=True)
Changing arguments of secure aggregation¶
Setting secagg
argument True
in Experiment
creates a default SecureAggregation
instance. Additionally, It is also possible to create SecureAggregation
instance and pass it as an argument. Here are the arguments that can be set for the SecureAggregation
active
:True
if the round will use secure aggregation. Default isTrue
clipping_range
: Clipping range that is going be use for quantization of model parameters. Default clipping range is3
. However, some models can have model weigths greater than3
. If clipping range is exceeded during the encryption on the nodes,Experiment
will log a warning message. In such cases, you can provide a higher clipping range through the argumentclipping_range
.timeout
: Timeout is the maximum amount of time, in seconds, that the experiment will wait for responses from all parties during secure aggregation setup. Since secure aggregation context depends on network communication and multi-party computation, this argument allows setting higher timeout for larger context setups, or vice versa.
from fedbiomed.researcher.secagg import SecureAggregation
secagg = SecureAggregation(
active=True,
clipping_range=100,
timeout=15
)
exp.set_secagg(secagg=secagg)
exp.run_once(increase=True)
Load experiment from a breakpoint¶
Once a breakpoint is loadded if the context is already exsiting there won't be context setup.
loaded_exp = Experiment.load_breakpoint()
loaded_exp.info()
loaded_exp.run_once(increase=True)