Transfer-learning in Fed-BioMed tutorial¶
Goal of this tutoriel¶
This tutorial shows how to do 2d images classification example on MedNIST dataset using pretrained PyTorch model.
The goal of this tutorial is to provide an example of transfer learning methods with Fed-BioMed for medical images classification.
About the model¶
The model used is Densenet-121 model(“Densely Connected Convolutional Networks”) pretrained on ImageNet dataset. The Pytorch pretrained model Densenet121 to perform image classification on the MedNIST dataset. The goal of this Densenet121 model is to predict the class of MedNIST
medical images.
About MedNIST¶
MedNIST provides an artificial 2d classification dataset created by gathering different medical imaging datasets from TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset. The dataset is kindly made available by Dr. Bradley J. Erickson M.D., Ph.D. (Department of Radiology, Mayo Clinic) under the Creative Commons CC BY-SA 4.0 license.
MedNIST dataset is downloaded from the resources provided by the project MONAI
The dataset MedNIST has 58954 images of size (3, 64, 64) distributed into 6 classes (10000 images per class except for BreastMRI class which has 8954 images). Classes are AbdomenCT, BreastMRI, CXR, ChestCT, Hand, HeadCT. It has the structure:
└── MedNIST/
├── AbdomenCT/
└── BreastMRI/
└── CXR/
└── ChestCT/
└── Hand/
└── HeadCT/
Transfer-learning¶
Transfer learning is a machine learning technique where a model trained on one task is repurposed or adapted for a second related task. Transfer learning uses a pre-trained neural network on a large dataset, as Imagenet is used to train DenseNet model to perform classification of a wide diversity of images.
The objective is that the knowledge gained from learning one task can be useful for learning another task (as we do here, the knowledge of DenseNet model trained on ImageNet is used to classify medical images in 6 categories). This is particularly beneficial when the amount of labeled data for the target task is limited, as the pre-trained model has already learned useful features and representations from a large dataset.
Transfer learning is typically applied in one of two ways:
(I) Feature Extraction: In this approach, the pre-trained model is used as a fixed feature extractor. The earlier layers of the neural network, which capture general features and patterns, are frozen, and only the later layers are replaced or retrained for the new task.
(II) Fine-tuning: In this approach, the pre-trained model is further trained or partially trained on the new task. This allows the model to adapt its learned representations to the specifics of the new task while retaining some of the knowledge gained from the original task.
In this example, we load on two nodes a sampled dataset ( 500 images and 1000 images) of MedNIST to illustrate transfer-learning's effectiveness. The sampled dataset is made with a random selection of images and return a sampled dataset with balanced classes, to avoid classification's bias. We will run two independant TrainingPlan experiments, one without transfer-learning and the second with transfer learning. We will compare these two experiments running on DenseNet model with focus on loss value and accuracy as metrics to evaluate the effectiveness of Transfer-learning methods.
Nota: This Transfer-Learning example is not to be confused with Federated Transfer Learning-FTL (see for example this paper). The example only showcases here Transfer Learning on a Feerated Learning use case.
1. Load dataset or sampled dataset¶
- From the root directory of Fed-BioMed, run :
source ./scripts/fedbiomed_environment node
in order to load the Node environment - In this new environment, run the script python:
python ./notebooks/transfer-learning/download_sample_of_mednist.py -n 2
, with-n 2
the number of Nodes you want to create ( for more details about this script, please runnotebooks/transfer-learning/download_sample_of_mednist.py --help
) - The script will ask for each Nodes created the number of samples you want for your dataset. For example you could: Enter 500 the first time the script ask the number of samples, and 1000 the second time Scripts will output configuration files for each of Nodes, with configured database, using the following naming convention:
config_mednist_<i>_sampled.ini
where<i>
corresponds to the number of Node created. - Finally launch your Nodes by running:
./scripts/fedbiomed_run node --config config_mednist_1_sampled.ini start
. In another terminal, run./scripts/fedbiomed_run node --config config_mednist_2_sampled.ini start
.
Wait until you get Starting task manager
.
2. Launch the researcher¶
- From the root directory of Fed-BioMed, run :
./scripts/fedbiomed_run researcher start
- It opens the Jupyter notebook.
To make sure that MedNIST dataset is loaded in the node we can send a request to the network to list the available dataset in the node. The list command should output an entry for mednist data.
from fedbiomed.researcher.requests import Requests
req = Requests()
req.list()
Import of librairies¶
import torch
import torch.nn as nn
from torchvision.models.densenet import DenseNet121_Weights
import pandas as pd
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage
I- Adapt the last layer to your classification's goal¶
Here we use the DenseNet model that allows classification through 10000 samples. We could adapt this classification's task to the MedNIST dataset by replacing the last layer with our classifier. The model.classifier
layer of the DenseNet-121
model classifies images through 6 classes, in the Training Plan, by adapting the num_classes value (can be done in through model_args
argument).
Data augmentation¶
You could perform data augmentation through the preprocess part if you need. Here I show random flip, rotation and crops. You could do the preprocessing of images by doing only transforms.resize, transforms.to_tensor and transforms.normalize, as mentionned in the code below (commented lines).
I. Run an expriment for image's classification without Transfer-learning¶
Here we propose to run as first experiment a TrainingPlan0 with the untrained DenseNet model. Then, we will compare the loss value from the two other experiments allowing Transfer-learning methods.
We don't use the pre-trained weights. It is important to adapt learning rate. I propose you to start with lr=1e-4 and we could adapt learning rate according to the metric's evaluation.
I -1. Define Training plan experiment¶
class MyTrainingPlan1(TorchTrainingPlan):
def init_model(self, model_args):
model = models.densenet121(weights=None) # here model coefficients are set to random weights
# add the classifier
num_classes = model_args['num_classes']
num_ftrs = model.classifier.in_features
model.classifier= nn.Sequential(
nn.Linear(num_ftrs, 512),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(512, num_classes)
)
return model
def init_dependencies(self):
return [
"from torchvision import datasets, transforms, models",
"import torch.optim as optim",
"from torchvision.models import densenet121"
]
def init_optimizer(self, optimizer_args):
return optim.Adam(self.model().parameters(), lr=optimizer_args["lr"])
# training data
def training_data(self):
# Transform images and do data augmentation
preprocess = transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize(mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225])
])
train_data = datasets.ImageFolder(self.dataset_path,transform = preprocess)
train_kwargs = { 'shuffle': True}
return DataManager(dataset=train_data, **train_kwargs)
def training_step(self, data, target):
output = self.model().forward(data)
loss_func = nn.CrossEntropyLoss()
loss = loss_func(output, target)
return loss
training_args = {
'loader_args': { 'batch_size': 32, },
'optimizer_args': {'lr': 1e-3},
'epochs': 1,
'dry_run': False,
'batch_maxnum': 100, # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
'random_seed': 1234
}
model_args = {
'num_classes': 6, # adapt this number to the number of classes in your dataset
}
tags = ['#MEDNIST', '#dataset']
rounds = 1 # adjsut the number of rounds
exp = Experiment(tags=tags,
training_plan_class=MyTrainingPlan1,
model_args=model_args,
training_args=training_args,
round_limit=rounds,
aggregator=FedAverage())
# testing section
from fedbiomed.common.metrics import MetricTypes
exp.set_test_ratio(.1)
exp.set_test_on_global_updates(True)
exp.set_test_metric(MetricTypes.ACCURACY)
exp.set_tensorboard(True)
I - 3. Run your experiment¶
exp.run()
For example, At the end of training experiment, I obtained¶
fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: config_mednist_1_sampled
Round 2 | Iteration: 1/1 (100%) | Samples: 50/50
ACCURACY: 0.740000
---------
fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: config_mednist_2_sampled
Round 2 | Iteration: 1/1 (100%) | Samples: 100/100
ACCURACY: 0.780000
---------
I - 4. Save your model¶
You could save your model to later use it in a new TrainingPlan This save allows to import the model including your layers's modification and weights values.
#save model
exp.training_plan().export_model('./training_plan1_densenet_MedNIST')
I - 5. Results in tensorboard¶
from fedbiomed.researcher.environ import environ
tensorboard_dir = environ['TENSORBOARD_RESULTS_DIR']
%load_ext tensorboard
%tensorboard --logdir "$tensorboard_dir"
I - 6. Training timing¶
print("\nList the training rounds : ", exp.training_replies().keys())
print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1]
for r in round_data.values():
print("\t- {id} :\
\n\t\trtime_training={rtraining:.2f} seconds\
\n\t\tptime_training={ptraining:.2f} seconds\
\n\t\trtime_total={rtotal:.2f} seconds".format(id = r['node_id'],
rtraining = r['timing']['rtime_training'],
ptraining = r['timing']['ptime_training'],
rtotal = r['timing']['rtime_total']))
print('\n')
model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet121', weights=DenseNet121_Weights.DEFAULT)
torch.save(model.state_dict(), 'pretrained_model.pt')
torch.save(model.state_dict(), 'pretrained_model2.pt')
II-2. Adapt the last layer to your classification's goal¶
Here we use the DenseNet model that allows classification through 1500 samples (on 2 nodes). We could adapt this classification's task to the MedNIST dataset by replacing the last layer with our classifier. The model.classifier
layer of the DenseNet-121
model classifies images through 6 classes, in the Training Plan, by adapting the num_classes value (can be done in through model_args
argument).
The dataset is defined below, after TrainingPlan as previously shown.
You could also import the model you saved to perform your second TrainingPlan experiment (let's see below)
In this experiment I will unfreeze two last block layers and the classifier layers. Other layers will stay frozen (i.e. they will not change during the experiment).
I introduce a new argument in model_args
called num_unfrozen_blocks
. This argument specifies the number of blocks left unfrozen. In DenseNet model, layers are grouped whithin blocks. There is a total of 12 blocks, containing several layers each. In our experiment, we will consider rather freezing blocks of layer than layers.
from fedbiomed.common.training_plans import TorchTrainingPlan
class MyTrainingPlan2(TorchTrainingPlan):
def init_model(self, model_args):
model = models.densenet121(weights=None)
# let's unfreeze layers of the last dense block
num_unfrozen_layer = model_args['num_unfrozen_blocks']
for param in model.features[:-num_unfrozen_layer].parameters():
param.requires_grad = False
# add the classifier
num_ftrs = model.classifier.in_features
num_classes = model_args['num_classes']
model.classifier = nn.Sequential(
nn.Linear(num_ftrs, 512),
nn.ReLU(inplace=True),
nn.Linear(512, num_classes)
)
return model
def init_dependencies(self):
return [
"from torchvision import datasets, transforms, models",
"import torch.optim as optim"
]
def init_optimizer(self, optimizer_args):
return optim.Adam(self.model().parameters(), lr=optimizer_args["lr"])
def training_data(self):
# Custom torch Dataloader for MedNIST data and transform images and perform data augmentation
preprocess = transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize(mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225])
])
train_data = datasets.ImageFolder(self.dataset_path,transform = preprocess)
train_kwargs = { 'shuffle': True}
return DataManager(dataset=train_data, **train_kwargs)
def training_step(self, data, target):
output = self.model().forward(data)
loss_func = nn.CrossEntropyLoss()
loss = loss_func(output, target)
return loss
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage
training_args = {
'loader_args': { 'batch_size': 32, },
'optimizer_args': {'lr': 1e-4}, # You could decrease the learning rate
'epochs': 1, # you can increase the epoch's number =10
'dry_run': False,
'random_seed': 1234,
'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
model_args={
'num_classes': 6,
'num_unfrozen_blocks': 2
}
tags = ['#MEDNIST', '#dataset']
rounds = 1 # you can increase the rounds's number
exp = Experiment(tags=tags,
training_plan_class=MyTrainingPlan2,
model_args=model_args,
training_args=training_args,
round_limit=rounds,
aggregator=FedAverage())
from fedbiomed.common.metrics import MetricTypes
exp.set_test_ratio(.1)
exp.set_test_on_global_updates(True)
exp.set_test_metric(MetricTypes.ACCURACY)
exp.set_tensorboard(True)
# here we load the model we have saved with torch-hub weights
exp.training_plan().import_model('pretrained_model.pt')
II - 3. Run your experiment¶
exp.run()
For example, At the end of training experiment, I obtained :¶
fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: config_mednist_1_sampled
Round 2 | Iteration: 1/1 (100%) | Samples: 50/50
ACCURACY: 1.0000
---------
fedbiomed INFO - VALIDATION ON GLOBAL UPDATES
NODE_ID: config_mednist_2_sampled
Round 2 | Iteration: 1/1 (100%) | Samples: 100/100
ACCURACY: 1.0000
---------
print("\nList the training rounds : ", exp.training_replies().keys())
print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1]
for r in round_data.values():
print("\t- {id} :\
\n\t\trtime_training={rtraining:.2f} seconds\
\n\t\tptime_training={ptraining:.2f} seconds\
\n\t\trtime_total={rtotal:.2f} seconds".format(id = r['node_id'],
rtraining = r['timing']['rtime_training'],
ptraining = r['timing']['ptime_training'],
rtotal = r['timing']['rtime_total']))
print('\n')
II - 4. Export your model¶
#save model
exp.training_plan().export_model('./training_plan2_densenet_MedNIST')
II - 5. Display losses on Tensorboard¶
%reload_ext tensorboard
%tensorboard --logdir "$tensorboard_dir"
II - 6. Save and Import your model and parameters¶
You could import your first model from TrainingPlan1 instead of loading the original DenseNet. You could also retrieve the model's features.
# import your model from a file
model_features_ = torch.load('./training_plan2_densenet_MedNIST')
model_features_
II - 7. check model parameters changed/unchanged¶
Here we are just making sure that the layers that were supoosed to be modified have indeed been modified, between the original model downloaded from pytorch hub and the trained model.
We will discard the batch normalization layers, since those may have changed during the transfer learning operation
Let's first have a look to the layers in the model that we left unfrozen.
# unfrozen layers during transfer learning (MyTrainingPlan2)
model_features.features[:-model_args['num_unfrozen_blocks']]
# Here we check if Layers of the DenseNet model have changed between the initial model and the model extracted
# from the training plan (after transfer learning)
model_features = exp.training_plan().model()
table = pd.DataFrame(columns=["Layer name", "Layer set to frozen", "Is Layer changed?"])
ref_model = torch.load('pretrained_model.pt') # reloading model downloaded from pytorch hub
remove_norm_layers= lambda name : not any([x in name for x in ('norm', 'batch') ])
layers = list(ref_model.keys())
ours_layers = model_features.features[:-model_args['num_unfrozen_blocks']]
ours_layers = ['features.'+ x for x in ours_layers.state_dict().keys()]
_counter = 0
for i, (layer_name, param) in enumerate(model_features.state_dict().items()):
if i >= len(layers):
continue
l = layers[i]
if remove_norm_layers(l) :
r_tensor = ref_model[l]
if 'classifier' in layer_name:
table.loc[_counter] = [l, l in ours_layers, "non comparable"]
else:
t = model_features.get_parameter(l)
_is_close = bool(torch.isclose(r_tensor, t).all())
table.loc[_counter] = [l, l in ours_layers, not _is_close, ]
_counter += 1
# display comaprison table content
table
The table displays all layers, the one modified and untouched during the training. "non comparable"
means layers that have been modified from original model to our use case. Those layers are the classifiying layers.
Conclusions¶
Through these experiments, we have observed a better accuracy and a faster decreasing loss value with transfer-learning methods instead of using the untrained model.
To conclude with the method of transfer learning, it is depending on how many data you have. You could choose to train more layers and compare the metrics with partial fine-tuning. You choose the method that gives the best metrics for your experiment.