Managing Secure Aggregation on Researcher Side
Researcher component is responsible for managing secure aggregation context setup that prepares necessary elements to apply secure aggregation over encrypted model parameters. Some nodes might require secure aggregation while some of them don't, and some others don't support secure aggregation. Therefore, end-user (researcher) should activate secure aggregation depending on all participating nodes configuration.
Managing secure aggregation through Experiment
Activation
By default, secure aggregation is deactivated in Experiment
class. It can be activated by setting the secagg
as True
, and the default secure aggregation scheme is LOM.
from fedbiomed.researcher.federated_workflows import Experiment
Experiment(
secagg=True
)
Setting secagg True
instantiates a SecureAggregation
with default arguments as timeout
and clipping_range
. However, it is also possible to create a secure aggregation instance by providing desired argument values.
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.secagg import SecureAggregation
Experiment(
#...
secagg=SecureAggregation(clipping_range=30),
#....
)
Federated averaging
Once the secure aggregation is activated, experiment doesn't use the aggregator
parameter of the Experiment
(eg FedAverage
) for aggregation. Secure aggregation aggregates model parameters with its own federated average, but without weighting them. Therefore, using num_updates
instead of epochs
in training_args
is strongly recommended for secure aggregation.
The argument scheme
of SecureAggregation
allows to select secure aggregation scheme that is going to be used. However, schemes may require different pre or post configuration on the node side and researcher side. Therefore, please carefully read the configuration guide before changing secure aggregation scheme.
from fedbiomed.researcher.secagg import SecureAggregation, SecureAggregationSchemes
exp = Experiment(tags=tags,
model_args=model_args,
training_plan_class=MyTrainingPlan,
training_args=training_args,
round_limit=rounds,
aggregator=FedAverage(),
node_selection_strategy=None,
secagg=SecureAggregation(scheme=SecureAggregationSchemes.JOYE_LIBERT),
# or custom SecureAggregation(active=<bool>, clipping_range=<int>)
save_breakpoints=True)
Timeout
Secure aggregation setup starts specific processing in each Fed-BioMed component that participates in the federated training. However, these processes and communication delay might be longer or shorter than expected depending on number of nodes and communication bandwidth. Default timeouts cannot currently be configured through the user API, it is needed to edit the researcher.secagg.SecaggContext
in the library for each component accordingly.
Clipping Range
Encryption on the node-side is performed after the quantization of model weights/parameters. However, the maximum and minimum values of model parameters may vary depending on the technique used. Therefore, the clipping range of quantization depends on the model, data, or technique. The clipping range should always be greater than or equal to the maximum model weight value, but kept reasonably low.
By default, the clipping range is set to 3. If the clipping range is exceeded while encrypting model parameters, a warning is raised instead of failing. Therefore, the end-user is aware that the clipping range should be increased for the next rounds.
Setting clipping range
The optimal clipping range depends on the specific scenario and the models being used. In some cases, using too high of a clipping range can result in a loss of information and lead to decreased performance. Therefore, it is important to carefully choose the appropriate clipping range based on the specific situation and the characteristics of the models being used.
Troubleshooting
Can not set secure aggregation context on the researcher side
This may be because of the timeout on the researcher side. If you have low bandwidth, connection latency or many nodes, please try to increase timeout.
Model encryption takes too much time
The time of encryption depends on model size. If the model is larger, it is normal that the encryption takes longer.
I want to set secure aggregation context without re-running a round.
It is possible to access the secagg instance through the experiment object in order to reset the secure aggregation context by providing a list of parties and the experiment experiment_id
. This step works for all secure aggregation schemes.
from fedbiomed.researcher.federated_workflows import Experiment
from fedbiomed.researcher.environ import environ
exp = Experiment(secagg=True,
#....
)
exp.secagg.setup(
parties= parties=[environ["ID"]] + exp.filtered_federation_nodes(),
experiment_id=exp.id
)
exp.secagg.setup(
parties= parties=[environ["ID"]] + exp.filtered_federation_nodes(),
experiment_id=exp.id,
force=True
)
The outcome of the setup action can vary depending on the secure aggregation scheme used. For example, in the Joye-Libert scheme, the setup action generates servkey
, and attaches a default biprime number into its context. In contrast, the LOM scheme only tracks the secure aggregation setup status of the participating nodes. This ensures that all participating nodes have created their own context/elements for training before the system sends the train request.