Training Process with Training Plan Management¶

Introduction¶

Fed-BioMed offers a feature to run only the pre-approved training plans on the nodes by default. The nodes which receive your training plan might require approved training plans. Therefore, if the node accepts only the approved training plan, the training plan files that are sent by a researcher with the training request should be approved by the node side in advance. In this workflow, the training plan approval process is done by a real user/person who reviews the code contained in the training plan file/class. The reviewer makes sure the model doesn't contain any code that might cause privacy issues or harm the node.

In this tutorial, we will be creating a node with activated training plan control option.

Setting Up a Node¶

Enabling training plan control can be done both from config file or Fed-BioMed CLI while starting the node. The process of creating and starting a node with training plan control option is not so different from setting up a normal node. By default, if no option is specified in the CLI when the node is launched for the first time, the node disables training plan control in the security section of the config file. It then looks like the snippet below :

[security]
hashing_algorithm = SHA256
allow_default_training_plans = True
training_plan_approval = False

It is also possible to manage training plan approval mode using environment variables. FBM_SECURITY_TRAINING_PLAN_APPROVAL=True and FBM_SECURITY_ALLOW_DEFAULT_TRAINING_PLANS=True to activate training plan approval mode. They enable one-time override of the config file options at each launch of the node.

FBM_SECURITY_TRAINING_PLAN_APPROVAL=True : This variable enables training plan control for the node. If there isn't a config file for the node while running CLI, it creates a new config file with enabled training plan approval mode training_plan_approval = True.
FBM_SECURITY_ALLOW_DEFAULT_TRAINING_PLANS=True : This variable allows default training plans for train requests. These are the training plans that come for Fed-BioMed tutorials. For example, the training plan for MNIST dataset that we will be using for this tutorial. If the default training plans are enabled, node updates/registers training plan files which are located in envs/common/default_training_plans directory during the starting process of the node. This option has no effect if training plan control is not enabled.

Adding MNIST Dataset to The Node.¶

In this section we will add MNIST dataset to the node. While adding the dataset through CLI we'll also specify FBM_SECURITY_TRAINING_PLAN_APPROVAL=True and ALLOW_DEFAULt_TRAINING_PLANS=True options. This will create new component in the directory./my-node with following configuration that will be located in the my-node/etc/config.ini.

[security]
hashing_algorithm = SHA256
allow_default_training_plans = True
training_plan_approval = True

Now, let's run the following command.

$ FBM_SECURITY_TRAINING_PLAN_APPROVAL=True FBM_SECURITY_ALLOW_DEFAULT_TRAINING_PLANS=True fedbiomed node --path ./my-node dataset add

The CLI will ask you to select the dataset type. Since we will be working on MNIST dataset, please select 2 (default) and continue by typing y for the next prompt and select folder that you want to store MNIST dataset. Afterward, if you go to etc directory of fedbiomed, you can see config-n1.ini file.

Starting the Node¶

Now you can start your node by running following command;

$ fedbiomed node --path ./my-node start

Since config file has been configured to enable training plan control mode, you do not need to specify any extra parameter while starting the node. But it is also possible to start node with FBM_SECURITY_TRAINING_PLAN_APPROVAL=True, FBM_SECURITY_ALLOW_DEFAULT_TRAINING_PLANS=True or FBM_SECURITY_TRAINING_PLAN_APPROVAL=False, FBM_SECURITY_ALLOW_DEFAULT_TRAINING_PLANS=False. If you start your node with FBM_SECURITY_TRAINING_PLAN_APPROVAL=False it will disable training plan control even it is enabled in the config file.

Creating An Experiment¶

In this section we will be using default MNIST model which has been already registered by the node.

The following model is the model that will be sent to the node for training. Since the model files are processed by the Experiment to configure dependencies, import part of the final file might be different from this one.

In [ ]:

  Copied!     
 
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms


# Here we define the training plan to be used. 
class MyTrainingPlan(TorchTrainingPlan):
    
    # Defines and return model 
    def init_model(self, model_args):
        return self.Net(model_args = model_args)
    
    # Defines and return optimizer
    def init_optimizer(self, optimizer_args):
        return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"])
    
    # Declares and return dependencies
    def init_dependencies(self):
        deps = ["from torchvision import datasets, transforms"]
        return deps
    
    class Net(nn.Module):
        def __init__(self, model_args):
            super().__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout(0.25)
            self.dropout2 = nn.Dropout(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout2(x)
            x = self.fc2(x)


            output = F.log_softmax(x, dim=1)
            return output

    def training_data(self):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        loader_arguments = { 'shuffle': True}
        return DataManager(dataset=dataset1, **loader_arguments)
    
    def training_step(self, data, target):
        output = self.model().forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss
import torch import torch.nn as nn from fedbiomed.common.training_plans import TorchTrainingPlan from fedbiomed.common.data import DataManager from torchvision import datasets, transforms # Here we define the training plan to be used. class MyTrainingPlan(TorchTrainingPlan): # Defines and return model def init_model(self, model_args): return self.Net(model_args = model_args) # Defines and return optimizer def init_optimizer(self, optimizer_args): return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"]) # Declares and return dependencies def init_dependencies(self): deps = ["from torchvision import datasets, transforms"] return deps class Net(nn.Module): def __init__(self, model_args): super().__init__() self.conv1 = nn.Conv2d(1, 32, 3, 1) self.conv2 = nn.Conv2d(32, 64, 3, 1) self.dropout1 = nn.Dropout(0.25) self.dropout2 = nn.Dropout(0.5) self.fc1 = nn.Linear(9216, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = self.conv2(x) x = F.relu(x) x = F.max_pool2d(x, 2) x = self.dropout1(x) x = torch.flatten(x, 1) x = self.fc1(x) x = F.relu(x) x = self.dropout2(x) x = self.fc2(x) output = F.log_softmax(x, dim=1) return output def training_data(self): # Custom torch Dataloader for MNIST data transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]) dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform) loader_arguments = { 'shuffle': True} return DataManager(dataset=dataset1, **loader_arguments) def training_step(self, data, target): output = self.model().forward(data) loss = torch.nn.functional.nll_loss(output, target) return loss 

To be able to get/see the final model file we need to initialize the experiment.

In [ ]:

  Copied!     
 
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

model_args = {}

training_args = {
    'loader_args': { 'batch_size': 48, }, 
    'optimizer_args': {
        "lr" : 1e-3
    },
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)
from fedbiomed.researcher.federated_workflows import Experiment from fedbiomed.researcher.aggregators.fedavg import FedAverage tags = ['#MNIST', '#dataset'] rounds = 2 model_args = {} training_args = { 'loader_args': { 'batch_size': 48, }, 'optimizer_args': { "lr" : 1e-3 }, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples } exp = Experiment(tags=tags, model_args=model_args, training_plan_class=MyTrainingPlan, training_args=training_args, round_limit=rounds, aggregator=FedAverage(), node_selection_strategy=None)

Getting Final Training Plan File From Experiment¶

training_plan_file() displays the training plan file that will be sent to the nodes.

In [ ]:

  Copied!     
 
exp.training_plan_file(display = True)
exp.training_plan_file(display = True)

The exp.check_training_plan_status() sends request to the experiment's nodes to check whether the model is approved or not. The nodes that will receive the requests are the nodes that have been found after searching datasets.

In [ ]:

  Copied!     
 
status = exp.check_training_plan_status()
status = exp.check_training_plan_status()

In [ ]:

  Copied!     
 
status
status

In [ ]:

  Copied!     
 
exp.run_once()
exp.run_once()

The logs should indicate that the training plan is approved. You can also get status object from the result of the check_training_plan_status(). It returns a list of status objects each for different node. Since we have only launched a single node, it returns only one status object.

approval_obligation : Indicates whether the training plan control is enabled in the node.
status : Indicates training plan approval status.

Changing Training Plan And Testing Training Plan Approval Status¶

Let's change the training plan network codes and test whether it is approved or not. We will be changing the network structure.

In [ ]:

  Copied!     
 
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms


# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    
    # Defines and return model 
    def init_model(self, model_args):
        return self.Net(model_args = model_args)
    
    # Defines and return optimizer
    def init_optimizer(self, optimizer_args):
        return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"])
    
    # Declares and return dependencies
    def init_dependencies(self):
        deps = ["from torchvision import datasets, transforms"]
        return deps
    
    class Net(nn.Module):
        def __init__(self, model_args):
            super().__init__()
            self.conv1 = nn.Conv2d(1, 16, 5, 1, 2)
            self.conv2 = nn.Conv2d(16, 32, 5, 1, 2)
            self.fc1 = nn.Linear(32 * 7 * 7, 10)
        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = torch.flatten(x, 1)
            x = self.fc1(x)

            output = F.log_softmax(x, dim=1)
            return output

    def training_data(self):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = { 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.model().forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss
import torch import torch.nn as nn from fedbiomed.common.training_plans import TorchTrainingPlan from fedbiomed.common.data import DataManager from torchvision import datasets, transforms # Here we define the model to be used. # You can use any class name (here 'Net') class MyTrainingPlan(TorchTrainingPlan): # Defines and return model def init_model(self, model_args): return self.Net(model_args = model_args) # Defines and return optimizer def init_optimizer(self, optimizer_args): return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"]) # Declares and return dependencies def init_dependencies(self): deps = ["from torchvision import datasets, transforms"] return deps class Net(nn.Module): def __init__(self, model_args): super().__init__() self.conv1 = nn.Conv2d(1, 16, 5, 1, 2) self.conv2 = nn.Conv2d(16, 32, 5, 1, 2) self.fc1 = nn.Linear(32 * 7 * 7, 10) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = F.max_pool2d(x, 2) x = self.conv2(x) x = F.relu(x) x = F.max_pool2d(x, 2) x = torch.flatten(x, 1) x = self.fc1(x) output = F.log_softmax(x, dim=1) return output def training_data(self): # Custom torch Dataloader for MNIST data transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]) dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform) train_kwargs = { 'shuffle': True} return DataManager(dataset=dataset1, **train_kwargs) def training_step(self, data, target): output = self.model().forward(data) loss = torch.nn.functional.nll_loss(output, target) return loss

In the following cell, we update the training plan class using the setter set_training_plan_class.

In [ ]:

  Copied!     
 
exp.set_training_plan_class(MyTrainingPlan, keep_weights=False)
exp.set_training_plan_class(MyTrainingPlan, keep_weights=False) 

Since we changed the model/network structure (we removed dropouts and one dense layer fc2) in the experiment, the output of the following method should say that the training plan is not approved by the node and is_approved key of the result object should be equal to False.

In [ ]:

  Copied!     
 
status = exp.check_training_plan_status()
status = exp.check_training_plan_status()

In [ ]:

  Copied!     
 
exp.training_plan_file()
exp.training_plan_file()

In [ ]:

  Copied!     
 
status
status

Since the training plan is not approved, you won't be able to train your model in the node. The following cell will return an error.

In [ ]:

  Copied!     
 
exp.run_once(increase=True)
exp.run_once(increase=True)

Registering and Approving the Training Plan¶

To register/approve the training plan that has been created in the previous section, we can use Fed-BioMed CLI. In Fed-Biomed, there are two ways of approving a model:

By sending an ApprovalRequest from the researcher to the Node
By adding it directly to the Node through model registration facility

1. Approving a Training Plan through an `ApprovalRequest`¶

Fed-BioMed 's Experiment interface provides a method to submit a training plan to the Node, for approval. Node can then review the code and approve the training plan using CLI or GUI.

The method of Experiment sending such request is training_plan_approve

In [ ]:

  Copied!     
 
exp.training_plan_approve(description="my new training plans")
exp.training_plan_approve(description="my new training plans")

Once the training plan has been sent, we need to approve it (or reject it) on Node side.

Before approving, optionally list models/training plans known to the node with their status (Approved, Pending, Rejected). Your new training plan should appear with Pending status and name my new training plan.

$ fedbiomed node --path ./my-node training-plan list

Then approve the training plan, using the following command on a new terminal:

$ fedbiomed node -d my-node training-plan approve

Training plans with both Pending or Rejected status will be displayed. Select the training plan you have sent to approve it. You might see a message explaining that training plan has successfully been approved.

Optionally list again training plans known to the node with their status. Your training plan should now appear with Approved status.

$ fedbiomed node --path ./my-node training-plan list

Back on the Researcher side, let's check it status by running the check_model_status command:

In [ ]:

  Copied!     
 
exp.check_training_plan_status()
exp.check_training_plan_status()

Model's status must have changed from Pending status to Approved, which means model can be trained from now on on the Node. Researcher can now run an Experiment on the Node!

In [ ]:

  Copied!     
 
exp.run_once(increase=True)
exp.run_once(increase=True)

2. Registering a Model through Node interface¶

Training plan status must have changed from Pending status to Approved, which means model can be trained from now on the Node. Researcher can now run an Experiment on the Node!

In [ ]:

  Copied!     
 
exp.training_plan_file()
exp.training_plan_file()

The output of the exp.training_plan_file() is a file path that shows where the final training plan is saved. It also prints the content of the training plan file. You can either get the content of training plan from the output cell or the path where it is saved. Anyway, you need to create a new txt file and copy the training plan content in it. You can create new directory in Fed-BioMed called training_plans and inside it, you can create new my-training-plan.txt file and copy the training plan class content into it.

$ mkdir ${FEDBIOMED_DIR}/my_approved_training_plan
$ cp <training_plan_path_file> ${FEDBIOMED_DIR}/my_approved_training_plan/my-training-plan.txt

Where <model_path_file> is the path of the model that is returned by exp.training_plan_file(display=False)

Afterward, please run following command in other terminal to register training plan file.

$ fedbiomed node --path config-n1.ini training-plan register

You should type a unique name for your training plan e.g. 'MyTestTP-1' and a description. The CLI will ask you select training plan file you want to register. Select the file that you saved and continue.

Now, you should be able to train your model defined in the training plan.

Back on the Researcher side, you should now be able to train your model.

In [ ]:

  Copied!     
 
exp.check_training_plan_status()
exp.check_training_plan_status()

In [ ]:

  Copied!     
 
exp.run_once(increase=True)
exp.run_once(increase=True)

Rejecting training plans¶

On Node side, it is possible to reject a Model using cli or GUI. Every type of training plan can be Rejected, even Default models. In Fed-BioMed, Rejected means that training plan cannot be trained/executed on the Node (but training plan is still Registered into the database).

Using cli, Node can run:

$ fedbiomed node --path my-node training-plan reject

and select the training plan to be Rejected.

In [ ]:

  Copied!     
 
exp.check_training_plan_status()
exp.check_training_plan_status()

In [ ]:

  Copied!     
 
exp.run_once(increase=True)
exp.run_once(increase=True)

Training Process with Training Plan Management¶

Introduction¶

Setting Up a Node¶

Adding MNIST Dataset to The Node.¶

Starting the Node¶

Creating An Experiment¶

Getting Final Training Plan File From Experiment¶

Changing Training Plan And Testing Training Plan Approval Status¶

Registering and Approving the Training Plan¶

1. Approving a Training Plan through an ApprovalRequest¶

2. Registering a Model through Node interface¶

Rejecting training plans¶

1. Approving a Training Plan through an `ApprovalRequest`¶