• Home
  • User Documentation
  • About
  • More
    • Funding
    • News
    • Contributors
    • Users
    • Roadmap
    • Contact Us
  • Home
  • User Documentation
  • About
  • More
    • Funding
    • News
    • Contributors
    • Users
    • Roadmap
    • Contact Us
  • Getting Started
    • What's Fed-BioMed
    • Fedbiomed Architecture
    • Fedbiomed Workflow
    • Installation
    • Basic Example
    • Configuration
  • Tutorials
    • PyTorch
      • PyTorch MNIST Basic Example
      • How to Create Your Custom PyTorch Training Plan
      • PyTorch Used Cars Dataset Example
      • PyTorch aggregation methods in Fed-BioMed
      • Transfer-learning in Fed-BioMed tutorial
    • MONAI
      • Federated 2d image classification with MONAI
      • Federated 2d XRay registration with MONAI
    • Scikit-Learn
      • MNIST classification with Scikit-Learn Classifier (Perceptron)
      • Fed-BioMed to train a federated SGD regressor model
      • Implementing other Scikit Learn models for Federated Learning
    • Optimizers
      • Advanced optimizers in Fed-BioMed
    • FLamby
      • General Concepts
      • FLamby integration in Fed-BioMed
    • Advanced
      • In Depth Experiment Configuration
      • PyTorch model training using a GPU
      • Breakpoints
    • Security
      • Using Differential Privacy with OPACUS on Fed-BioMed
      • Local and Central DP with Fed-BioMed: MONAI 2d image registration
      • Training Process with Training Plan Management
      • Training with Secure Aggregation
      • End-to-end Privacy Preserving Training and Inference on Medical Data
    • Biomedical data
      • Brain Segmentation
  • User Guide
    • Glossary
    • Deployment
      • Introduction
      • VPN Deployment
      • Network matrix
      • Security model
    • Node
      • Configuring Nodes
      • Deploying Datasets
      • Training Plan Management
      • Using GPU
      • Node GUI
    • Researcher
      • Training Plan
      • Training Data
      • Experiment
      • Aggregation
      • Listing Datasets and Selecting Nodes
      • Model Validation on the Node Side
      • Tensorboard
    • Optimization
    • Secure Aggregation
      • Introduction
      • Configuration
      • Managing Secure Aggregation in Researcher
  • Developer
    • API Reference
      • Common
        • Certificate Manager
        • CLI
        • Config
        • Constants
        • Data
        • DB
        • Exceptions
        • IPython
        • Json
        • Logger
        • Message
        • Metrics
        • Model
        • MPC controller
        • Optimizers
        • Privacy
        • Secagg
        • Secagg Manager
        • Serializer
        • Singleton
        • Synchro
        • TasksQueue
        • TrainingPlans
        • TrainingArgs
        • Utils
        • Validator
      • Node
        • CLI
        • CLI Utils
        • Config
        • DatasetManager
        • HistoryMonitor
        • Node
        • NodeStateManager
        • Requests
        • Round
        • Secagg
        • Secagg Manager
        • TrainingPlanSecurityManager
      • Researcher
        • Aggregators
        • CLI
        • Config
        • Datasets
        • Federated Workflows
        • Filetools
        • Jobs
        • Monitor
        • NodeStateAgent
        • Requests
        • Secagg
        • Strategies
      • Transport
        • Client
        • Controller
        • NodeAgent
        • Server
    • Usage and Tools
    • Continuous Integration
    • Definition of Done
    • Development Environment
    • Testing in Fed-BioMed
    • RPC Protocol and Messages
  • FAQ & Troubleshooting
Download Notebook

Using Differential Privacy with OPACUS on Fed-BioMed¶

In this notebook we show how opacus (https://opacus.ai/) can be used in Fed-BioMed. Opacus is a library which allows to train PyTorch models with differential privacy. We will train the basic MNIST example using two nodes.

Setting the node up¶

It is necessary to previously configure a node:

  1. You can create a node by adding a dataset, fedbiomed node dataset add
  • Select option 2 (default)
  • Confirm default tags by hitting "y" and ENTER
  • Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  • Data must have been added (if you get a warning saying that data must be unique is because it's been already added)

This process will create a default node component in the directory where the command is executed, or use the existing one.
2. Check that your data has been added by executing fedbiomed node dataset list 3. Run the node using fedbiomed node start. Wait until you get Starting task manager. it means you are online.

Defining a Training Plan and Parameters¶

In [ ]:
Copied!
import torch
import torch.nn as nn
import torch.nn.functional as F
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms

# Here we define the training plan to be used in the experiment. 
class MyTrainingPlan(TorchTrainingPlan):
    def init_dependencies(self):
        deps = ["from torchvision import datasets, transforms",
                "import torch.nn.functional as F"]
        
        return deps
    
    def init_model(self):
        model = nn.Sequential(nn.Conv2d(1, 32, 3, 1),
                                  nn.ReLU(),
                                  nn.Conv2d(32, 64, 3, 1),
                                  nn.ReLU(),
                                  nn.MaxPool2d(2),
                                  nn.Dropout(0.25),
                                  nn.Flatten(),
                                  nn.Linear(9216, 128),
                                  nn.ReLU(),
                                  nn.Dropout(0.5),
                                  nn.Linear(128, 10),
                                  nn.LogSoftmax(dim=1))
        return model
    

    
    def training_data(self):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        loader_arguments = { 'shuffle': True}
        return DataManager(dataset1, **loader_arguments)
    
    def training_step(self, data, target):
        output = self.model().forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss
import torch import torch.nn as nn import torch.nn.functional as F from fedbiomed.common.training_plans import TorchTrainingPlan from fedbiomed.common.data import DataManager from torchvision import datasets, transforms # Here we define the training plan to be used in the experiment. class MyTrainingPlan(TorchTrainingPlan): def init_dependencies(self): deps = ["from torchvision import datasets, transforms", "import torch.nn.functional as F"] return deps def init_model(self): model = nn.Sequential(nn.Conv2d(1, 32, 3, 1), nn.ReLU(), nn.Conv2d(32, 64, 3, 1), nn.ReLU(), nn.MaxPool2d(2), nn.Dropout(0.25), nn.Flatten(), nn.Linear(9216, 128), nn.ReLU(), nn.Dropout(0.5), nn.Linear(128, 10), nn.LogSoftmax(dim=1)) return model def training_data(self): # Custom torch Dataloader for MNIST data transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]) dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform) loader_arguments = { 'shuffle': True} return DataManager(dataset1, **loader_arguments) def training_step(self, data, target): output = self.model().forward(data) loss = torch.nn.functional.nll_loss(output, target) return loss

This group of arguments correspond respectively:

  • model_args: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side. For instance, the privacy parameters should be passed here.
  • training_args: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

NOTE: typos and/or lack of positional (required) arguments will raise error. 🤓

In the cell below, we are going to define dp_args inside the training_args dictionary. Based on the given paremeters node will perform Opacus's differeantal privacy.

  • noise_multiplier - sigma: The ratio of the standard deviation of the Gaussian noise to the L2-sensitivity of the function to which the noise is added (How much noise to add)

  • max_grad_norm - clip: The maximum norm of the per-sample gradients. Any gradient with norm higher than this will be clipped to this value.

  • type: Differential privacy type as one of local or central

In [ ]:
Copied!
model_args = {}

training_args = {
    'loader_args': { 'batch_size': 48, },
    'optimizer_args': {
        'lr': 1e-3
    },
    'epochs': 1, 
    'dry_run': False, 
    'dp_args': # DP Arguments for differential privacy
        {
            "type": "local", 
            "sigma": 0.4, 
            "clip": 0.005
        },
    'batch_maxnum': 50 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
model_args = {} training_args = { 'loader_args': { 'batch_size': 48, }, 'optimizer_args': { 'lr': 1e-3 }, 'epochs': 1, 'dry_run': False, 'dp_args': # DP Arguments for differential privacy { "type": "local", "sigma": 0.4, "clip": 0.005 }, 'batch_maxnum': 50 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples }

Declare and run the experiment¶

In [ ]:
Copied!
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 3

exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)
from fedbiomed.researcher.federated_workflows import Experiment from fedbiomed.researcher.aggregators.fedavg import FedAverage tags = ['#MNIST', '#dataset'] rounds = 3 exp = Experiment(tags=tags, model_args=model_args, training_plan_class=MyTrainingPlan, training_args=training_args, round_limit=rounds, aggregator=FedAverage(), node_selection_strategy=None)

Let's start the experiment.

By default, this function doesn't stop until all the rounds are done for all the nodes

In [ ]:
Copied!
exp.run()
exp.run()

Save trained model to file

In [ ]:
Copied!
exp.training_plan().export_model('./trained_model')
exp.training_plan().export_model('./trained_model')
In [ ]:
Copied!

Federated parameters for each round are available in exp.aggregated_params() (index 0 to (rounds - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [ ]:
Copied!
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())
print("\nList the training rounds : ", exp.aggregated_params().keys()) print("\nAccess the federated params for the last training round :") print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())

Testing¶

We define a little testing routine to extract the accuracy metrics on the testing dataset

In [ ]:
Copied!
import torch
import torch.nn.functional as F


def testing_Accuracy(model, data_loader):
    model.eval()
    test_loss = 0
    correct = 0
    device = 'cpu'

    correct = 0
    
    with torch.no_grad():
        for data, target in data_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

        pred = output.argmax(dim=1, keepdim=True)

    test_loss /= len(data_loader.dataset)
    accuracy = 100* correct/len(data_loader.dataset)

    return(test_loss, accuracy)
import torch import torch.nn.functional as F def testing_Accuracy(model, data_loader): model.eval() test_loss = 0 correct = 0 device = 'cpu' correct = 0 with torch.no_grad(): for data, target in data_loader: data, target = data.to(device), target.to(device) output = model(data) test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability correct += pred.eq(target.view_as(pred)).sum().item() pred = output.argmax(dim=1, keepdim=True) test_loss /= len(data_loader.dataset) accuracy = 100* correct/len(data_loader.dataset) return(test_loss, accuracy)
In [ ]:
Copied!
from torchvision import datasets, transforms
from fedbiomed.researcher.config import config
import os

local_mnist = os.path.join(config.vars['TMP_DIR'], 'local_mnist')

transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])

test_set = datasets.MNIST(root = local_mnist, download = True, train = False, transform = transform)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=64, shuffle=True)
from torchvision import datasets, transforms from fedbiomed.researcher.config import config import os local_mnist = os.path.join(config.vars['TMP_DIR'], 'local_mnist') transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ]) test_set = datasets.MNIST(root = local_mnist, download = True, train = False, transform = transform) test_loader = torch.utils.data.DataLoader(test_set, batch_size=64, shuffle=True)
In [ ]:
Copied!
fed_model = exp.training_plan().model()
fed_model.load_state_dict(exp.aggregated_params()[rounds - 1]['params'])

acc_federated = testing_Accuracy(fed_model, test_loader)

print('\nAccuracy federated training:  {:.4f}'.format(acc_federated[1]))

print('\nError federated training:  {:.4f}'.format(acc_federated[0]))
fed_model = exp.training_plan().model() fed_model.load_state_dict(exp.aggregated_params()[rounds - 1]['params']) acc_federated = testing_Accuracy(fed_model, test_loader) print('\nAccuracy federated training: {:.4f}'.format(acc_federated[1])) print('\nError federated training: {:.4f}'.format(acc_federated[0]))
Download Notebook
  • Setting the node up
  • Defining a Training Plan and Parameters
  • Declare and run the experiment
  • Testing
Address:

2004 Rte des Lucioles, 06902 Sophia Antipolis

E-mail:

fedbiomed _at_ inria _dot_ fr

Fed-BioMed © 2022