Training with Secure Aggregation¶
Secure aggregation is one of the security feature that is provided by Fed-BioMed. Please refer to secure aggregation user guide for more information regarding the methods and techniques that are used. This tutorial gives an example of secure aggregation usage in Fed-BioMed.
Setting up the nodes¶
During the tutorial, nodes and researcher will be launched locally using single clone of Fed-BioMed. However, it is also possible to execute notebook cells when the components are configured remotely by respecting following instruction.
Configuring/Installing Element for Secure Aggregation¶
You can follow the detailed instructions for configuring Fed-BioMed instance for secure aggregation or apply following shortened instructions for a basic setup.
1. Install and configure¶
Fed-BioMed uses MP-SPDZ for MPC. Therefore, please make sure that MP-SPDZ are installed and configured for Fed-BioMed by running following command.
${FEDBIOMED_DIR}/scripts/fedbiomed_configure_secagg node
Since node and researcher will be run in the same machine, single configuration for MP-SDPZ will be enough
2. Create node and researcher instances¶
The setup for secure aggregation requires knowledge of the participating Fed-BioMed components in advance. Therefore, each component that will participate in the training should be created before starting them. Afterwards, participating components can be registered in every other component.
2.1¶
It is mandatory to have at least two nodes for the experiment that requires secure aggregation. Please execute following commands to create two nodes.
Node 1:
${FEDBIOMED_DIR}/scripts/fedbiomed_run configuration create --component NODE --name config-n1.ini
Node 2:
${FEDBIOMED_DIR}/scripts/fedbiomed_run configuration create --component NODE --name config-n2.ini
2.2 Create researcher¶
Please run the command below to create researcher component.
${FEDBIOMED_DIR}/scripts/fedbiomed_run configuration create --component researcher
3. Registering participating Fed-BioMed instances¶
Normally, as it is mentioned in secure aggregation configuration each participating instance should register network credentials of others such as IP, port and SSL certificate. however, since this example will be run on single clone of Fed-BioMed, registration process can be done automaticity by running following command.
${FEDBIOMED_DIR}/scripts/fedbiomed_run certificate-dev-setup
4. Add dataset and start nodes¶
The next step will be adding/deploying MNIST dataset in the nodes and starting them. For these step you can follow the instructions for adding dataset into nodes to add MNIST dataset. After the datasets are deployed you can start the nodes and researcher.
Define an experiment model and parameters"¶
Declare a torch training plan MyTrainingPlan class to send for training on the node
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms
# Here we define the model to be used.
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
# Defines and return model
def init_model(self, model_args):
return self.Net(model_args = model_args)
# Defines and return optimizer
def init_optimizer(self, optimizer_args):
return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"])
# Declares and return dependencies
def init_dependencies(self):
deps = ["from torchvision import datasets, transforms"]
return deps
class Net(nn.Module):
def __init__(self, model_args):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
def training_data(self):
# Custom torch Dataloader for MNIST data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
train_kwargs = { 'shuffle': True}
return DataManager(dataset=dataset1, **train_kwargs)
def training_step(self, data, target):
output = self.model().forward(data)
loss = torch.nn.functional.nll_loss(output, target)
return loss
This group of arguments correspond respectively:
model_args
: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.training_args
: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.
NOTE: typos and/or lack of positional (required) arguments will raise error. 🤓
model_args = {}
training_args = {
'loader_args': { 'batch_size': 48, },
'optimizer_args': {
"lr" : 1e-3
},
'epochs': 1,
'dry_run': False,
'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
Declare and run the experiment¶
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage
from fedbiomed.researcher.secagg import SecureAggregation
tags = ['#MNIST', '#dataset']
rounds = 2
exp = Experiment(tags=tags,
model_args=model_args,
training_plan_class=MyTrainingPlan,
training_args=training_args,
round_limit=rounds,
aggregator=FedAverage(),
node_selection_strategy=None,
secagg=True, # or custom SecureAggregation(active=<bool>, clipping_range=<int>)
save_breakpoints=True)
Access secure aggregation context¶
Please use the attribute secagg
to verify secure aggregation is set as active
print("Is using secagg: ", exp.secagg.active)
It is also possible to check secure aggregation context using secagg
attribute. Since secure aggregation context negotiation will occur during experiment run, context and id should be None
at this point.
print("Secagg Biprime ", exp.secagg.biprime)
print("Secagg Servkey ", exp.secagg.servkey)
Run the experiment, using secure aggregation. Secure aggregation context will be created before the first training round, and it is going to be updated before each round when new nodes are added or removed to the experiment.
exp.run(increase=True)
Save trained model to file
exp.training_plan().export_model('./trained_model')
Display context after running one round of training.
print("Secagg Biprime context: ", exp.secagg.biprime.context)
print("Secagg Servkey context: ", exp.secagg.servkey.context)
Changes in experiment triggers re-creation of secure aggregation context¶
The changes that re-create jobs like adding new node to the experiment will trigger automatic secure aggregation re-setup for the next round.
# sends new dataset search request
from fedbiomed.researcher.strategies import DefaultStrategy
from fedbiomed.researcher.aggregators.fedavg import FedAverage
exp.set_training_data(None, True)
exp.set_strategy(DefaultStrategy)
exp.set_aggregator(FedAverage)
exp.set_job()
exp.run_once(increase=True)
Changing arguments of secure aggregation¶
Setting secagg
argument True
in Experiment
creates a default SecureAggregation
instance. Additionally, It is also possible to create SecureAggregation
instance and pass it as an argument. Here are the arguments that can be set for the SecureAggregation
active
:True
if the round will use secure aggregation. Default isTrue
clipping_range
: Clipping range that is going be use for quantization of model parameters. Default clipping range is3
. However, some models can have model weigths greater than3
. If clipping range is exceeded during the encryption on the nodes,Experiment
will log a warning message. In such cases, you can provide a higher clipping range through the argumentclipping_range
.
from fedbiomed.researcher.secagg import SecureAggregation
secagg = SecureAggregation(
active=True,
clipping_range=100,
)
exp.set_secagg(secagg=secagg)
exp.run_once(increase=True)
Load experiment from a breakpoint¶
Once a breakpoint is loadded if the context is already existing there won't be context setup.
loaded_exp = Experiment.load_breakpoint()
loaded_exp.info()
loaded_exp.run_once(increase=True)