Experiment

Classes

Experiment(*args, aggregator=None, agg_optimizer=None, node_selection_strategy=None, round_limit=None, tensorboard=False, retain_full_history=True, **kwargs)

Bases: TrainingPlanWorkflow

A Federated Learning Experiment based on a Training Plan.

This class provides a comprehensive entry point for the management and orchestration of a FL experiment, including definition, execution, and interpretation of results.

Managing model parameters

The model parameters should be managed through the corresponding methods in the training_plan by accessing the experiment's training_plan() attribute and using the set_model_params and get_model_params functions, e.g.

exp.training_plan().set_model_params(params_dict)

Do not set the training plan attribute directly

Setting the training_plan attribute directly is not allowed. Instead, use the set_training_plan_class method to set the training plan type, and the underlying model will be correctly constructed and initialized.

Parameters:

Name	Type	Description	Default
`aggregator`	`Optional[Aggregator]`	object defining the method for aggregating local updates. Default to None (use `FedAverage` for aggregation)	`None`
`agg_optimizer`	`Optional[Optimizer]`	`Optimizer` instance, to refine aggregated model updates prior to their application. If None, merely apply the aggregated updates.	`None`
`node_selection_strategy`	`Optional[Strategy]`	object defining how nodes are sampled at each round for training, and how non-responding nodes are managed. Defaults to None: - use `DefaultStrategy` if training_data is initialized - else strategy is None (cannot be initialized), experiment cannot be launched yet	`None`
`round_limit`	`Union[int, None]`	the maximum number of training rounds (nodes <-> central server) that should be executed for the experiment. `None` means that no limit is defined. Defaults to None.	`None`
`tensorboard`	`bool`	whether to save scalar values for displaying in Tensorboard during training for each node. Currently, it is only used for loss values. - If it is true, monitor instantiates a `Monitor` object that write scalar logs into `./runs` directory. - If it is False, it stops monitoring if it was active.	`False`
`retain_full_history`	`bool`	whether to retain in memory the full history of node replies and aggregated params for the experiment. If False, only the last round's replies and aggregated params will be available. Defaults to True.	`True`
`*args`		Extra positional arguments from parent class `TrainingPlanWorkflow`	`()`
`**kwargs`		Arguments of parent class `TrainingPlanWorkflow`	`{}`

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def __init__(
    self,
    *args,
    aggregator: Optional[Aggregator] = None,
    agg_optimizer: Optional[Optimizer] = None,
    node_selection_strategy: Optional[Strategy] = None,
    round_limit: Union[int, None] = None,
    tensorboard: bool = False,
    retain_full_history: bool = True,
    **kwargs
) -> None:
    """Constructor of the class.

    Args:
        aggregator: object defining the method for aggregating
            local updates. Default to None (use
            [`FedAverage`][fedbiomed.researcher.aggregators.FedAverage] for aggregation)

        agg_optimizer: [`Optimizer`][fedbiomed.common.optimizers.Optimizer] instance,
            to refine aggregated model updates prior to their application. If None,
            merely apply the aggregated updates.

        node_selection_strategy: object defining how nodes are sampled at
            each round for training, and how non-responding nodes are managed.
            Defaults to None:
            - use [`DefaultStrategy`][fedbiomed.researcher.strategies.DefaultStrategy]
                if training_data is initialized
            - else strategy is None (cannot be initialized), experiment cannot be launched yet

        round_limit: the maximum number of training rounds (nodes <-> central server)
            that should be executed for the experiment. `None` means that no limit is
            defined. Defaults to None.

        tensorboard: whether to save scalar values  for displaying in Tensorboard
            during training for each node. Currently, it is only used for loss values.
            - If it is true, monitor instantiates a `Monitor` object
                that write scalar logs into `./runs` directory.
            - If it is False, it stops monitoring if it was active.

        retain_full_history: whether to retain in memory the full history
            of node replies and aggregated params for the experiment. If False, only the
            last round's replies and aggregated params will be available. Defaults to True.
        *args: Extra positional arguments from parent class
            [`TrainingPlanWorkflow`][fedbiomed.researcher.federated_workflows.TrainingPlanWorkflow]
        **kwargs: Arguments of parent class
            [`TrainingPlanWorkflow`][fedbiomed.researcher.federated_workflows.TrainingPlanWorkflow]
    """
    # define new members
    self._node_selection_strategy: Strategy = None
    self._round_limit = None
    self._monitor = None
    self._aggregator = None
    self._agg_optimizer = None
    self.aggregator_args = {}
    self._aggregated_params = {}
    self._training_replies: Dict = {}
    self._retain_full_history = None

    # initialize object
    super().__init__(*args, **kwargs)

    # set self._aggregator : type Aggregator
    self.set_aggregator(aggregator)

    # set self._agg_optimizer: type Optional[Optimizer]
    self.set_agg_optimizer(agg_optimizer)

    # set self._node_selection_strategy: type Union[Strategy, None]
    self.set_strategy(node_selection_strategy)

    # "current" means number of rounds already trained
    self._set_round_current(0)
    self.set_round_limit(round_limit)

    # always create a monitoring process
    self._monitor = Monitor()
    self._reqs.add_monitor_callback(self._monitor.on_message_handler)
    self.set_tensorboard(tensorboard)

    # whether to retain the full experiment history or not
    self.set_retain_full_history(retain_full_history)

Attributes

aggregator_args `instance-attribute`

aggregator_args = {}

Functions

agg_optimizer

agg_optimizer()

Retrieves the optional Optimizer used to refine aggregated model updates.

To set or update that optimizer: set_agg_optimizer.

Returns:

Type	Description
`Optional[Optimizer]`	An Optimizer instance,
`Optional[Optimizer]`	or None.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def agg_optimizer(self) -> Optional[Optimizer]:
    """Retrieves the optional Optimizer used to refine aggregated model updates.

    To set or update that optimizer:
    [`set_agg_optimizer`][fedbiomed.researcher.federated_workflows.Experiment.set_agg_optimizer].

    Returns:
        An [Optimizer][fedbiomed.common.optimizers.Optimizer] instance,
        or None.
    """
    return self._agg_optimizer

aggregated_params

aggregated_params()

Retrieves all aggregated parameters of each round of training

Returns:

Type	Description
`dict`	Dictionary of aggregated parameters keys stand for each round of training

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def aggregated_params(self) -> dict:
    """Retrieves all aggregated parameters of each round of training

    Returns:
        Dictionary of aggregated parameters keys stand for each round of training
    """

    return self._aggregated_params

aggregator

aggregator()

Retrieves aggregator class that will be used for aggregating model parameters.

To set or update aggregator: set_aggregator.

Returns:

Type	Description
`Aggregator`	A class or an object that is an instance of Aggregator

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def aggregator(self) -> Aggregator:
    """Retrieves aggregator class that will be used for aggregating model parameters.

    To set or update aggregator:
    [`set_aggregator`][fedbiomed.researcher.federated_workflows.Experiment.set_aggregator].

    Returns:
        A class or an object that is an instance of [Aggregator][fedbiomed.researcher.aggregators.Aggregator]

    """
    return self._aggregator

breakpoint

breakpoint()

Saves breakpoint with the state of the training at a current round.

The following Experiment attributes will be saved:

round_current
round_limit
aggregator
agg_optimizer
node_selection_strategy
aggregated_params

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def breakpoint(self) -> None:
    """
    Saves breakpoint with the state of the training at a current round.

    The following Experiment attributes will be saved:

      - round_current
      - round_limit
      - aggregator
      - agg_optimizer
      - node_selection_strategy
      - aggregated_params
    """
    # need to have run at least 1 round to save a breakpoint
    if self._round_current < 1:
        msg = ErrorNumbers.FB413.value + \
            ' - need to run at least 1 before saving a breakpoint'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    # conditions are met, save breakpoint
    breakpoint_path, breakpoint_file_name = \
        choose_bkpt_file(self._experimentation_folder, self._round_current - 1)

    # predefine several breakpoint states
    agg_bkpt = None
    agg_optim_bkpt = None
    strategy_bkpt = None
    training_replies_bkpt  = None
    if self._aggregator is not None:
        agg_bkpt = self._aggregator.save_state_breakpoint(breakpoint_path,
                                                          global_model=self.training_plan().after_training_params())
    if self._agg_optimizer is not None:
        # FIXME: harmonize naming of save_object
        agg_optim_bkpt = self.save_optimizer(breakpoint_path)
    if self._node_selection_strategy is not None:
        strategy_bkpt = self._node_selection_strategy.save_state_breakpoint()
    if self._training_replies is not None:
        training_replies_bkpt = self.save_training_replies()

    state = {
        'round_current': self._round_current,
        'round_limit': self._round_limit,
        'aggregator': agg_bkpt,
        'agg_optimizer': agg_optim_bkpt,
        'node_selection_strategy': strategy_bkpt,
        'aggregated_params': self.save_aggregated_params(
            self._aggregated_params, breakpoint_path),
        'training_replies': training_replies_bkpt,
    }

    super().breakpoint(state, self._round_current)

commit_experiment_history

commit_experiment_history(training_replies, aggregated_params)

Commits the experiment history to memory.

The experiment history is defined as

training replies
aggregated parameters

This function checks the retain_full_history flag: if it is True, it simply adds (or overwrites) the current round's entry for the training_replies and aggregated_params dictionary. If the flag is set to False, we simply store the last round's values in the same dictionary format.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

def commit_experiment_history(self,
                              training_replies: Dict[str, Dict[str, Any]],
                              aggregated_params: Dict[str, Any]) -> None:
    """Commits the experiment history to memory.

    The experiment history is defined as:
        - training replies
        - aggregated parameters

    This function checks the retain_full_history flag: if it is True, it simply adds
    (or overwrites) the current round's entry for the training_replies and aggregated_params
    dictionary. If the flag is set to False, we simply store the last round's values in the
    same dictionary format.
    """
    if self._retain_full_history:
        # append to history
        self._training_replies[self._round_current] = training_replies
        self._aggregated_params[self._round_current] = {'params': aggregated_params}
    else:
        # only store the last round's values
        self._training_replies = {self._round_current: training_replies}
        self._aggregated_params = {self._round_current: {'params': aggregated_params}}

info

info()

Prints out the information about the current status of the experiment.

Lists all the parameters/arguments of the experiment and informs whether the experiment can be run.

Raises:

Type	Description
`FedbiomedExperimentError`	Inconsistent experiment due to missing variables

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def info(self) -> Tuple[Dict[str, List[str]], str]:
    """Prints out the information about the current status of the experiment.

    Lists  all the parameters/arguments of the experiment and informs whether the experiment can be run.

    Raises:
        FedbiomedExperimentError: Inconsistent experiment due to missing variables
    """
    # at this point all attributes are initialized (in constructor)

    info = self._create_default_info_structure()

    info['Arguments'].extend([
        'Aggregator',
        'Strategy',
        'Aggregator Optimizer',
        'Rounds already run',
        'Rounds total',
        'Breakpoint State',
    ])
    info['Values'].extend(['\n'.join(findall('.{1,60}',
                                     str(e))) for e in [
        self._aggregator.aggregator_name if self._aggregator is not None else None,
        self._node_selection_strategy,
        self._agg_optimizer,
        self._round_current,
        self._round_limit,
        self._save_breakpoints,
    ]])

    missing = self._check_missing_objects()
    return super().info(info, missing)

load_breakpoint `classmethod`

load_breakpoint(breakpoint_folder_path=None)

Loads breakpoint (provided a breakpoint has been saved) so experience can be resumed. Useful if training has crashed researcher side or if user wants to resume a given experiment.

Parameters:

Name	Type	Description	Default
`cls`	`Type[TExperiment]`	Experiment class	required
`breakpoint_folder_path`	`Union[str, None]`	path of the breakpoint folder. Path can be absolute or relative eg: "var/experiments/Experiment_xxxx/breakpoints_xxxx". If None, loads latest breakpoint of the latest experiment. Defaults to None.	`None`

Returns:

Type	Description
`TExperiment`	Reinitialized experiment object. With given object, user can then use `.run()` method to pursue model training.

Raises:

Type	Description
`FedbiomedExperimentError`	bad argument type, error when reading breakpoint or bad loaded breakpoint content (corrupted)

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@classmethod
@exp_exceptions
def load_breakpoint(cls: Type[TExperiment],
                    breakpoint_folder_path: Union[str, None] = None) -> TExperiment:
    """
    Loads breakpoint (provided a breakpoint has been saved)
    so experience can be resumed. Useful if training has crashed
    researcher side or if user wants to resume a given experiment.

    Args:
      cls: Experiment class
      breakpoint_folder_path: path of the breakpoint folder. Path can be absolute or relative eg:
        "var/experiments/Experiment_xxxx/breakpoints_xxxx". If None, loads latest breakpoint of the latest
        experiment. Defaults to None.

    Returns:
        Reinitialized experiment object. With given object, user can then use `.run()`
            method to pursue model training.

    Raises:
        FedbiomedExperimentError: bad argument type, error when reading breakpoint or bad loaded breakpoint
            content (corrupted)
    """
    loaded_exp, saved_state = super().load_breakpoint(breakpoint_folder_path)
    # retrieve breakpoint sampling strategy
    bkpt_sampling_strategy_args = saved_state.get("node_selection_strategy")
    bkpt_sampling_strategy = cls._create_object(bkpt_sampling_strategy_args)
    loaded_exp.set_strategy(bkpt_sampling_strategy)
    # retrieve breakpoint researcher optimizer
    bkpt_optim = Experiment._load_optimizer(saved_state.get("agg_optimizer"))
    loaded_exp.set_agg_optimizer(bkpt_optim)
    # changing `Experiment` attributes
    loaded_exp._set_round_current(saved_state.get('round_current'))
    loaded_exp._aggregated_params = loaded_exp._load_aggregated_params(
        saved_state.get('aggregated_params')
    )
    # retrieve and change aggregator
    bkpt_aggregator_args = saved_state.get("aggregator")
    bkpt_aggregator = cls._create_object(bkpt_aggregator_args, training_plan=loaded_exp.training_plan())
    loaded_exp.set_aggregator(bkpt_aggregator)
    # load training replies
    loaded_exp.load_training_replies(saved_state.get("training_replies"))
    logger.info(f"Experimentation reload from {breakpoint_folder_path} successful!")

    return loaded_exp

load_training_replies

load_training_replies(bkpt_training_replies)

Reads training replies from a formatted breakpoint file.

Builds a job training replies data structure .

Parameters:

Name	Type	Description	Default
`bkpt_training_replies`	`Dict[int, Dict[str, Dict[str, Any]]]`	Extract from training replies saved in breakpoint	required

Returns:

Type	Description
`None`	Training replies of already executed rounds of the experiment

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

def load_training_replies(
    self,
    bkpt_training_replies: Dict[int, Dict[str, Dict[str, Any]]]
) -> None:
    """Reads training replies from a formatted breakpoint file.

    Builds a job training replies data structure .

    Args:
        bkpt_training_replies: Extract from training replies saved in breakpoint

    Returns:
        Training replies of already executed rounds of the experiment
    """
    if not bkpt_training_replies:
        logger.warning("No Replies has been found in this breakpoint")

    rounds = set(bkpt_training_replies.keys())
    for round_ in rounds:
        # reload parameters from file params_path
        for node in bkpt_training_replies[round_].values():
            node["params"] = Serializer.load(node["params_path"])
        bkpt_training_replies[int(round_)] = bkpt_training_replies.pop(round_)

    self._training_replies = bkpt_training_replies

monitor

monitor()

Retrieves the monitor object

Monitor is responsible for receiving and parsing real-time training and validation feed-back from each node participate to federated training. See Monitor

Returns:

Type	Description
`Monitor`	Monitor object that will always exist with experiment to retrieve feed-back from the nodes.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def monitor(self) -> Monitor:
    """Retrieves the monitor object

    Monitor is responsible for receiving and parsing real-time training and validation feed-back from each node
    participate to federated training. See [`Monitor`][fedbiomed.researcher.monitor.Monitor]

    Returns:
        Monitor object that will always exist with experiment to retrieve feed-back from the nodes.
    """
    return self._monitor

retain_full_history

retain_full_history()

Retrieves the status of whether the full experiment history should be kept in memory.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def retain_full_history(self):
    """Retrieves the status of whether the full experiment history should be kept in memory."""
    return self._retain_full_history

round_current

round_current()

Retrieves the round where the experiment is at.

Returns:

Type	Description
`int`	Indicates the round number that the experiment will perform next.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def round_current(self) -> int:
    """Retrieves the round where the experiment is at.

    Returns:
        Indicates the round number that the experiment will perform next.
    """
    return self._round_current

round_limit

round_limit()

Retrieves the round limit from the experiment object.

Please see also set_round_limit to change or set round limit.

Returns:

Type	Description
`Union[int, None]`	Round limit that shows maximum number of rounds that can be performed. `None` if it isn't declared yet.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def round_limit(self) -> Union[int, None]:
    """Retrieves the round limit from the experiment object.

    Please see  also [`set_round_limit`][fedbiomed.researcher.federated_workflows.Experiment.set_round_limit]
    to change or set round limit.

    Returns:
        Round limit that shows maximum number of rounds that can be performed. `None` if it isn't declared yet.
    """
    return self._round_limit

run

run(rounds=None, increase=False)

Run one or more rounds of an experiment, continuing from the point the experiment had reached.

Parameters:

Name	Type	Description	Default
`rounds`	`Optional[int]`	Number of experiment rounds to run in this call. * `None` means "run all the rounds remaining in the experiment" computed as maximum rounds (`round_limit` for this experiment) minus the number of rounds already run rounds (`round_current` for this experiment). It does nothing and issues a warning if `round_limit` is `None` (no round limit defined for the experiment) * `int` >= 1 means "run at most `rounds` rounds". If `round_limit` is `None` for the experiment, run exactly `rounds` rounds. If a `round_limit` is set for the experiment and the number or rounds would increase beyond the `round_limit` of the experiment: - if `increase` is True, increase the `round_limit` to (`round_current` + `rounds`) and run `rounds` rounds - if `increase` is False, run (`round_limit` - `round_current`) rounds, don't modify the maximum `round_limit` of the experiment and issue a warning.	`None`
`increase`	`bool`	automatically increase the `round_limit` of the experiment for executing the specified number of `rounds`. Does nothing if `round_limit` is `None` or `rounds` is None. Defaults to False	`False`

Returns:

Type	Description
`int`	Number of rounds have been run

Raises:

Type	Description
`FedbiomedExperimentError`	bad argument type or value

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def run(self, rounds: Optional[int] = None, increase: bool = False) -> int:
    """Run one or more rounds of an experiment, continuing from the point the
    experiment had reached.

    Args:
        rounds: Number of experiment rounds to run in this call.
            * `None` means "run all the rounds remaining in the experiment" computed as
                maximum rounds (`round_limit` for this experiment) minus the number of
                rounds already run rounds (`round_current` for this experiment).
                It does nothing and issues a warning if `round_limit` is `None` (no
                round limit defined for the experiment)
            * `int` >= 1 means "run at most `rounds` rounds".
                If `round_limit` is `None` for the experiment, run exactly `rounds` rounds.
                If a `round_limit` is set for the experiment and the number or rounds would
            increase beyond the `round_limit` of the experiment:
            - if `increase` is True, increase the `round_limit` to
              (`round_current` + `rounds`) and run `rounds` rounds
            - if `increase` is False, run (`round_limit` - `round_current`)
              rounds, don't modify the maximum `round_limit` of the experiment
              and issue a warning.
        increase: automatically increase the `round_limit`
            of the experiment for executing the specified number of `rounds`.
            Does nothing if `round_limit` is `None` or `rounds` is None.
            Defaults to False

    Returns:
        Number of rounds have been run

    Raises:
        FedbiomedExperimentError: bad argument type or value
    """
    # check rounds is a >=1 integer or None
    if rounds is None:
        pass
    else:
        msg = ErrorNumbers.FB410.value + \
            f', in method `run` param `rounds` : value {rounds}'
        self._check_round_value_consistency(rounds, msg)

    # check increase is a boolean
    if not isinstance(increase, bool):
        msg = ErrorNumbers.FB410.value + \
            f', in method `run` param `increase` : type {type(increase)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    # compute number of rounds to run + updated rounds limit
    if rounds is None:
        if isinstance(self._round_limit, int):
            # run all remaining rounds in the experiment
            rounds = self._round_limit - self._round_current
            if rounds == 0:
                # limit already reached
                logger.warning(f'Round limit of {self._round_limit} already reached '
                               'for this experiment, do nothing.')
                return 0
        else:
            # cannot run if no number of rounds given and no round limit exists
            logger.warning('Cannot run, please specify a number of `rounds` to run or '
                           'set a `round_limit` to the experiment')
            return 0

    else:
        # at this point, rounds is an int >= 1
        if isinstance(self._round_limit, int):
            if (self._round_current + rounds) > self._round_limit:
                if increase:
                    # dont change rounds, but extend self._round_limit as necessary
                    logger.debug(f'Auto increasing total rounds for experiment from {self._round_limit} '
                                 f'to {self._round_current + rounds}')
                    self._round_limit = self._round_current + rounds
                else:
                    new_rounds = self._round_limit - self._round_current
                    if new_rounds == 0:
                        # limit already reached
                        logger.warning(f'Round limit of {self._round_limit} already reached '
                                       'for this experiment, do nothing.')
                        return 0
                    else:
                        # reduce the number of rounds to run in the experiment
                        logger.warning(f'Limit of {self._round_limit} rounds for the experiment '
                                       f'will be reached, reducing the number of rounds for this '
                                       f'run from {rounds} to {new_rounds}')
                        rounds = new_rounds

    # FIXME: should we print warning if both rounds and _round_limit are None?
    # At this point `rounds` is an int > 0 (not None)

    # run the rounds
    for _ in range(rounds):
        if isinstance(self._round_limit, int) and self._round_current == (self._round_limit - 1) \
                and self._training_args['test_on_global_updates'] is True:
            # Do "validation after a round" only if this a round limit is defined and we reached it
            # and validation is active on global params
            # When this condition is met, it also means we are running the last of
            # the `rounds` rounds in this function
            test_after = True
        else:
            test_after = False

        increment = self.run_once(increase=False, test_after=test_after)

        if increment == 0:
            # should not happen
            msg = ErrorNumbers.FB400.value + \
                f', in method `run` method `run_once` returns {increment}'
            logger.critical(msg)
            raise FedbiomedExperimentError(msg)

    return rounds

run_once

run_once(increase=False, test_after=False)

Run at most one round of an experiment, continuing from the point the experiment had reached.

If round_limit is None for the experiment (no round limit defined), run one round. If round_limit is not None and the round_limit of the experiment is already reached: * if increase is False, do nothing and issue a warning * if increase is True, increment total number of round round_limit and run one round

Parameters:

Name	Type	Description	Default
`increase`	`bool`	automatically increase the `round_limit` of the experiment if needed. Does nothing if `round_limit` is `None`. Defaults to False	`False`
`test_after`	`bool`	if True, do a second request to the nodes after the round, only for validation on aggregated params. Intended to be used after the last training round of an experiment. Defaults to False.	`False`

Returns:

Type	Description
`int`	Number of rounds really run

Raises:

Type	Description
`FedbiomedExperimentError`	bad argument type or value

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def run_once(self, increase: bool = False, test_after: bool = False) -> int:
    """Run at most one round of an experiment, continuing from the point the
    experiment had reached.

    If `round_limit` is `None` for the experiment (no round limit defined), run one round.
    If `round_limit` is not `None` and the `round_limit` of the experiment is already reached:
    * if `increase` is False, do nothing and issue a warning
    * if `increase` is True, increment total number of round `round_limit` and run one round

    Args:
        increase: automatically increase the `round_limit` of the experiment if needed. Does nothing if
            `round_limit` is `None`. Defaults to False
        test_after: if True, do a second request to the nodes after the round, only for validation on aggregated
            params. Intended to be used after the last training round of an experiment. Defaults to False.

    Returns:
        Number of rounds really run

    Raises:
        FedbiomedExperimentError: bad argument type or value
    """
    # check increase is a boolean
    if not isinstance(increase, bool):
        msg = ErrorNumbers.FB410.value + \
            f', in method `run_once` param `increase` : type {type(increase)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    # nota:  we should never have self._round_current > self._round_limit, only ==
    if self._round_limit is not None and self._round_current >= self._round_limit:
        if increase is True:
            logger.debug(f'Auto increasing total rounds for experiment from {self._round_limit} '
                         f'to {self._round_current + 1}')
            self._round_limit = self._round_current + 1
        else:
            logger.warning(f'Round limit of {self._round_limit} was reached, do nothing')
            return 0

    # check pre-requisites are met for running a round
    # From here, node_selection_strategy is never None
    # if self._node_selection_strategy is None:
    #     msg = ErrorNumbers.FB411.value + ', missing `node_selection_strategy`'
    #     logger.critical(msg)
    #     raise FedbiomedExperimentError(msg)

    missing = self._check_missing_objects()
    if missing:
        raise FedbiomedExperimentError(ErrorNumbers.FB411.value + ': missing one or several object needed for'
                                       ' starting the `Experiment`. Details:\n' + missing)
    # Sample nodes for training

    training_nodes = self._node_selection_strategy.sample_nodes(
        from_nodes=self.filtered_federation_nodes(),
        round_i=self._round_current
    )
    # Setup Secure Aggregation (it's a noop if not active)
    secagg_arguments = self.secagg_setup(training_nodes)

    # Setup aggregator
    self._aggregator.set_training_plan_type(self.training_plan().type())
    self._aggregator.check_values(n_updates=self._training_args.get('num_updates'),
                                  training_plan=self.training_plan())
    model_params_before_round = self.training_plan().after_training_params()
    aggregator_args = self._aggregator.create_aggregator_args(model_params_before_round,
                                                              training_nodes)

    # Collect auxiliary variables from the aggregator optimizer, if any.
    optim_aux_var = self._collect_optim_aux_var()

    # update node states when list of nodes has changed from one round to another
    self._update_nodes_states_agent(before_training=True)
    # TODO check node state agent
    nodes_state_ids = self._node_state_agent.get_last_node_states()

    # if fds is updated, aggregator should be updated too
    job = TrainingJob(nodes=training_nodes,
                      keep_files_dir=self.experimentation_path(),
                      experiment_id=self._experiment_id,
                      round_=self._round_current,
                      training_plan=self.training_plan(),
                      training_args=self._training_args,
                      model_args=self.model_args(),
                      data=self._fds,
                      nodes_state_ids=nodes_state_ids,
                      aggregator_args=aggregator_args,
                      do_training=True,
                      secagg_arguments=secagg_arguments,
                      optim_aux_var=optim_aux_var
                      )

    logger.info('Sampled nodes in round ' + str(self._round_current) + ' ' + str(job.nodes))

    training_replies, aux_vars = job.execute()

    # update node states with node answers + when used node list has changed during the round
    self._update_nodes_states_agent(before_training=False, training_replies=training_replies)

    # refining/normalizing model weights received from nodes
    model_params, weights, total_sample_size, encryption_factors = self._node_selection_strategy.refine(
        training_replies, self._round_current)

    if self._secagg.active:
        flatten_params = self._secagg.aggregate(
            round_=self._round_current,
            encryption_factors=encryption_factors,
            total_sample_size=total_sample_size,
            model_params=model_params,
            num_expected_params=len(self.training_plan().get_model_wrapper_class().flatten(
                exclude_buffers = not self.training_args()['share_persistent_buffers']))
        )
        # FIXME: Access TorchModel through non-private getter once it is implemented
        aggregated_params: Dict[str, Union[torch.tensor, np.ndarray]] = (
            self.training_plan().get_model_wrapper_class().unflatten(
                flatten_params, exclude_buffers = not self.training_args()['share_persistent_buffers'])
        )

    else:
        # aggregate models from nodes to a global model
        aggregated_params = self._aggregator.aggregate(model_params,
                                                       weights,
                                                       global_model=model_params_before_round,
                                                       training_plan=self.training_plan(),
                                                       training_replies=training_replies,
                                                       node_ids=job.nodes,
                                                       n_updates=self._training_args.get('num_updates'),
                                                       n_round=self._round_current)

    # Optionally refine the aggregated updates using an Optimizer.
    self._process_optim_aux_var(aux_vars)
    aggregated_params = self._run_agg_optimizer(self.training_plan(),
                                                aggregated_params)

    # Update the training plan with the aggregated parameters
    self.training_plan().set_model_params(aggregated_params)

    # Update experiment's in-memory history
    self.commit_experiment_history(training_replies, aggregated_params)

    # Increase round number (should be incremented before call to `breakpoint`)
    self._set_round_current(self._round_current + 1)
    if self._save_breakpoints:
        self.breakpoint()

    # do final validation after saving breakpoint :
    # not saved in breakpoint for current round, but more simple
    if test_after:
        # FIXME: should we sample nodes here too?
        aggr_args = self._aggregator.create_aggregator_args(self.training_plan().after_training_params(),
                                                            training_nodes)

        job = TrainingJob(nodes=training_nodes,
                          keep_files_dir=self.experimentation_path(),
                          experiment_id=self._experiment_id,
                          round_=self._round_current,
                          training_plan=self.training_plan(),
                          training_args=self._training_args,
                          model_args=self.model_args(),
                          data=self._fds,
                          nodes_state_ids=nodes_state_ids,
                          aggregator_args=aggr_args,
                          do_training=False
                          )
        job.execute()


    return 1

save_aggregated_params `staticmethod`

save_aggregated_params(aggregated_params_init, breakpoint_path)

Extract and format fields from aggregated_params that need to be saved in breakpoint.

Creates link to the params file from the breakpoint_path and use them to reference the params files.

Parameters:

Name	Type	Description	Default
`aggregated_params_init`	`dict`	aggregated parameters	required
`breakpoint_path`	`str`	path to the directory where breakpoints files and links will be saved	required

Returns:

Type	Description
`Dict[int, dict]`	Extract from `aggregated_params`

Raises:

Type	Description
`FedbiomedExperimentError`	bad arguments type

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@staticmethod
@exp_exceptions
def save_aggregated_params(aggregated_params_init: dict, breakpoint_path: str) -> Dict[int, dict]:
    """Extract and format fields from aggregated_params that need to be saved in breakpoint.

    Creates link to the params file from the `breakpoint_path` and use them to reference the params files.

    Args:
        aggregated_params_init (dict): aggregated parameters
        breakpoint_path: path to the directory where breakpoints files and links will be saved

    Returns:
        Extract from `aggregated_params`

    Raises:
        FedbiomedExperimentError: bad arguments type
    """
    # check arguments type, though is should have been done before
    if not isinstance(aggregated_params_init, dict):
        msg = ErrorNumbers.FB413.value + ' - save failed. ' + \
            f'Bad type for aggregated params, should be `dict` not {type(aggregated_params_init)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)
    if not isinstance(breakpoint_path, str):
        msg = ErrorNumbers.FB413.value + ' - save failed. ' + \
            f'Bad type for breakpoint path, should be `str` not {type(breakpoint_path)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    aggregated_params = {}
    for round_, params_dict in aggregated_params_init.items():
        if not isinstance(params_dict, dict):
            msg = ErrorNumbers.FB413.value + ' - save failed. ' + \
                f'Bad type for aggregated params item {str(round_)}, ' + \
                f'should be `dict` not {type(params_dict)}'
            logger.critical(msg)
            raise FedbiomedExperimentError(msg)

        params_path = os.path.join(breakpoint_path, f"aggregated_params_{uuid.uuid4()}.mpk")
        Serializer.dump(params_dict['params'], params_path)
        aggregated_params[round_] = {'params_path': params_path}

    return aggregated_params

save_optimizer

save_optimizer(breakpoint_path)

Save the researcher-side Optimizer attached to this Experiment.

Parameters:

Name	Type	Description	Default
`breakpoint_path`	`str`	Path to the breakpoint folder.	required

Returns:

Type	Description
`Optional[str]`	Path to the optimizer's save file, or None if no Optimizer is used.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def save_optimizer(self, breakpoint_path: str) -> Optional[str]:
    """Save the researcher-side Optimizer attached to this Experiment.

    Args:
        breakpoint_path: Path to the breakpoint folder.

    Returns:
        Path to the optimizer's save file, or None if no Optimizer is used.
    """
    # Case when no researcher optimizer is used.
    if self._agg_optimizer is None:
        return None
    # Case when an Optimizer is used: save its state and return the path.
    state = self._agg_optimizer.get_state()
    path = os.path.join(breakpoint_path, f"optimizer_{uuid.uuid4()}.mpk")
    Serializer.dump(state, path)
    return path

save_training_replies

save_training_replies()

Extracts a copy of training_replies and prepares it for saving in breakpoint

strip unwanted fields
structure as list/dict, so it can be saved with JSON

Returns:

Type	Description
`Dict[int, Dict[str, Dict[str, Any]]]`	Extract from `training_replies` formatted for breakpoint

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

def save_training_replies(self) -> Dict[int, Dict[str, Dict[str, Any]]]:
    """Extracts a copy of `training_replies` and prepares it for saving in breakpoint

    - strip unwanted fields
    - structure as list/dict, so it can be saved with JSON

    Returns:
        Extract from `training_replies` formatted for breakpoint
    """
    converted_training_replies = copy.deepcopy(self.training_replies())
    for training_reply in converted_training_replies.values():
        # we want to strip some fields for the breakpoint
        for reply in training_reply.values():
            reply.pop('params', None)
    return converted_training_replies

set_agg_optimizer

set_agg_optimizer(agg_optimizer)

Sets the optional researcher optimizer.

Parameters:

Name	Type	Description	Default
`agg_optimizer`	`Optional[Optimizer]`	Optional fedbiomed Optimizer instance to be used so as to refine aggregated updates prior to applying them. If None, equivalent to using vanilla SGD with 1.0 learning rate.	required

Returns:

Type	Description
`Optional[Optimizer]`	The optional researcher optimizer attached to this Experiment.

Raises:

Type	Description
`FedbiomedExperimentError`	if `optimizer` is of unproper type.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_agg_optimizer(
    self,
    agg_optimizer: Optional[Optimizer],
) -> Optional[Optimizer]:
    """Sets the optional researcher optimizer.

    Args:
        agg_optimizer: Optional fedbiomed Optimizer instance to be
            used so as to refine aggregated updates prior to applying them.
            If None, equivalent to using vanilla SGD with 1.0 learning rate.

    Returns:
        The optional researcher optimizer attached to this Experiment.

    Raises:
        FedbiomedExperimentError: if `optimizer` is of unproper type.
    """
    if not (
        agg_optimizer is None or
        isinstance(agg_optimizer, Optimizer)
    ):
        raise FedbiomedExperimentError(
            f"{ErrorNumbers.FB410.value}: 'agg_optimizer' must be an "
            f"Optimizer instance or None, not {type(agg_optimizer)}."
        )
    self._agg_optimizer = agg_optimizer
    return self._agg_optimizer

set_aggregator

set_aggregator(aggregator=None)

Sets aggregator + verification on arguments type

Ensures consistency with the training data.

Parameters:

Name	Type	Description	Default
`aggregator`	`Optional[Aggregator]`	Object defining the method for aggregating local updates. Default to None (use `FedAverage` for aggregation)	`None`

Returns:

Type	Description
`Aggregator`	aggregator (Aggregator)

Raises:

Type	Description
`FedbiomedExperimentError`	bad aggregator type

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_aggregator(self, aggregator: Optional[Aggregator] = None) -> Aggregator:
    """Sets aggregator + verification on arguments type

    Ensures consistency with the training data.

    Args:
        aggregator: Object defining the method for aggregating local updates. Default to None
            (use `FedAverage` for aggregation)

    Returns:
        aggregator (Aggregator)

    Raises:
        FedbiomedExperimentError : bad aggregator type
    """

    if aggregator is None:
        # default aggregator
        self._aggregator = FedAverage()

    elif not isinstance(aggregator, Aggregator):

        msg = f"{ErrorNumbers.FB410.value}: aggregator is not an instance of Aggregator."
        logger.critical(msg)
        raise FedbiomedTypeError(msg)
    else:
        # at this point, `agregator` is an instance / inheriting of `Aggregator`
        self._aggregator = aggregator
    self.aggregator_args["aggregator_name"] = self._aggregator.aggregator_name
    # ensure consistency with federated dataset
    self._aggregator.set_fds(self._fds)

    return self._aggregator

set_retain_full_history

set_retain_full_history(retain_full_history_=True)

Sets the status of whether the full experiment history should be kept in memory.

Parameters:

Name	Type	Description	Default
`retain_full_history_`	`bool`	whether to retain in memory the full history of node replies and aggregated params for the experiment. If False, only the last round's replies and aggregated params will be available. Defaults to True.	`True`

Returns:

Type	Description
	The status of whether the full experiment history should be kept in memory.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_retain_full_history(self, retain_full_history_: bool = True):
    """Sets the status of whether the full experiment history should be kept in memory.

    Args:
        retain_full_history_: whether to retain in memory the full history of node replies and aggregated params
            for the experiment. If False, only the last round's replies and aggregated params will be available.
            Defaults to True.

    Returns:
        The status of whether the full experiment history should be kept in memory.
    """
    if not isinstance(retain_full_history_, bool):
        msg = ErrorNumbers.FB410.value + f': retain_full_history should be a bool, instead got ' \
                                         f'{type(retain_full_history_)} '
        logger.critical(msg)
        raise FedbiomedTypeError(msg)
    self._retain_full_history = retain_full_history_
    return self._retain_full_history

set_round_limit

set_round_limit(round_limit)

Sets round_limit + verification on arguments type

Parameters:

Name	Type	Description	Default
`round_limit`	`Union[int, None]`	the maximum number of training rounds (nodes <-> central server) that should be executed for the experiment. `None` means that no limit is defined.	required

Returns:

Type	Description
`Union[int, None]`	Round limit for experiment of federated learning

Raises:

Type	Description
`FedbiomedValueError`	bad rounds type or value

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_round_limit(self, round_limit: Union[int, None]) -> Union[int, None]:
    """Sets `round_limit` + verification on arguments type

    Args:
        round_limit: the maximum number of training rounds (nodes <-> central server) that should be executed
            for the experiment. `None` means that no limit is defined.

    Returns:
        Round limit for experiment of federated learning

    Raises:
        FedbiomedValueError : bad rounds type or value
    """
    # at this point round_current exists and is an int >= 0

    if round_limit is None:
        # no limit for training rounds
        self._round_limit = None
    else:
        self._check_round_value_consistency(round_limit, "round_limit")
        if round_limit < self._round_current:
            # self._round_limit can't be less than current round
            msg = f'cannot set `round_limit` to less than the number of already run rounds ' \
                f'({self._round_current})'
            logger.critical(msg)
            raise FedbiomedValueError(msg)

        else:
            self._round_limit = round_limit

    # at this point self._round_limit is a Union[int, None]
    return self._round_limit

set_strategy

set_strategy(node_selection_strategy=None)

Sets for node_selection_strategy + verification on arguments type

Parameters:

Name	Type	Description	Default
`node_selection_strategy`	`Optional[Strategy]`	object defining how nodes are sampled at each round for training, and how non-responding nodes are managed. Defaults to None: - use `DefaultStrategy` if training_data is initialized - else strategy is None (cannot be initialized), experiment cannot be launched yet	`None`

Returns:

Type	Description
`Union[Strategy, None]`	node selection strategy class

Raises:

Type	Description
`FedbiomedExperimentError`	bad strategy type

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_strategy(
    self,
    node_selection_strategy: Optional[Strategy] = None
) -> Union[Strategy, None]:
    """Sets for `node_selection_strategy` + verification on arguments type

    Args:
        node_selection_strategy: object defining how nodes are sampled at each round for training, and
            how non-responding nodes are managed. Defaults to None:
            - use `DefaultStrategy` if training_data is initialized
            - else strategy is None (cannot be initialized), experiment cannot
              be launched yet

    Returns:
        node selection strategy class

    Raises:
        FedbiomedExperimentError : bad strategy type
    """
    if node_selection_strategy is None:
        # default node_selection_strategy
        self._node_selection_strategy = DefaultStrategy()
    elif not isinstance(node_selection_strategy, Strategy):

        msg = f"{ErrorNumbers.FB410.value}: wrong type for " \
              "node_selection_strategy {type(node_selection_strategy)} " \
              "it should be an instance of Strategy"
        logger.critical(msg)
        raise FedbiomedTypeError(msg)
    else:
        self._node_selection_strategy = node_selection_strategy
    # at this point self._node_selection_strategy is a Union[Strategy, None]
    return self._node_selection_strategy

set_tensorboard

set_tensorboard(tensorboard)

Sets the tensorboard flag

Parameters:

Name	Type	Description	Default
`tensorboard`	`bool`	If `True` tensorboard log files will be writen after receiving training feedbacks	required

Returns:

Type	Description
`bool`	Status of tensorboard

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_tensorboard(self, tensorboard: bool) -> bool:
    """
    Sets the tensorboard flag

    Args:
        tensorboard: If `True` tensorboard log files will be writen after receiving training feedbacks

    Returns:
        Status of tensorboard
    """

    if isinstance(tensorboard, bool):
        self._tensorboard = tensorboard
        self._monitor.set_tensorboard(tensorboard)
    else:
        msg = ErrorNumbers.FB410.value + f' `tensorboard` : {type(tensorboard)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    return self._tensorboard

set_test_metric

set_test_metric(metric, **metric_args)

Sets a metric for federated model validation

Parameters:

Name	Type	Description	Default
`metric`	`Union[MetricTypes, str, None]`	A class as an instance of `MetricTypes`. `str` for referring one of metric which provided as attributes in `MetricTypes`. None, if it isn't declared yet.	required
`**metric_args`	`dict`	A dictionary that contains arguments for metric function. Arguments should be compatible with corresponding metrics in `sklearn.metrics`.	`{}`

Returns:

Type	Description
`Tuple[Union[str, None], Dict[str, Any]]`	Metric and metric args as tuple

Raises:

Type	Description
`FedbiomedExperimentError`	Invalid type for `metric` argument

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_test_metric(self, metric: Union[MetricTypes, str, None], **metric_args: dict) -> \
        Tuple[Union[str, None], Dict[str, Any]]:
    """ Sets a metric for federated model validation

    Args:
        metric: A class as an instance of [`MetricTypes`][fedbiomed.common.metrics.MetricTypes]. [`str`][str] for
            referring one of  metric which provided as attributes in [`MetricTypes`]
            [fedbiomed.common.metrics.MetricTypes]. None, if it isn't declared yet.
        **metric_args: A dictionary that contains arguments for metric function. Arguments
            should be compatible with corresponding metrics in [`sklearn.metrics`][sklearn.metrics].

    Returns:
        Metric and  metric args as tuple

    Raises:
        FedbiomedExperimentError: Invalid type for `metric` argument
    """
    self._training_args['test_metric'] = metric

    # using **metric_args, we know `test_metric_args` is a Dict[str, Any]
    self._training_args['test_metric_args'] = metric_args
    return metric, metric_args

set_test_on_global_updates

set_test_on_global_updates(flag=True)

Setter for test_on_global_updates, that indicates whether to perform a validation on the federated model updates on the node side before training model locally where aggregated model parameters are received.

Parameters:

Name	Type	Description	Default
`flag`	`bool`	whether to perform model validation on global updates. Defaults to True.	`True`

Returns:

Type	Description
`bool`	Value of the flag `test_on_global_updates`.

Raises:

Type	Description
`FedbiomedExperimentError`	bad flag type

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_test_on_global_updates(self, flag: bool = True) -> bool:
    """
    Setter for test_on_global_updates, that indicates whether to  perform a validation on the federated model
    updates on the node side before training model locally where aggregated model parameters are received.

    Args:
        flag (bool, optional): whether to perform model validation on global updates. Defaults to True.

    Returns:
        Value of the flag `test_on_global_updates`.

    Raises:
        FedbiomedExperimentError : bad flag type
    """
    self._training_args['test_on_global_updates'] = flag
    return self._training_args['test_on_global_updates']

set_test_on_local_updates

set_test_on_local_updates(flag=True)

Setter for test_on_local_updates, that indicates whether to perform a validation on the federated model on the node side where model parameters are updated locally after training in each node.

Parameters:

Name	Type	Description	Default
`flag`	`bool`	whether to perform model validation on local updates. Defaults to True.	`True`

Returns:

Type	Description
`bool`	value of the flag `test_on_local_updates`

Raises:

Type	Description
`FedbiomedExperimentError`	bad flag type

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_test_on_local_updates(self, flag: bool = True) -> bool:
    """
    Setter for `test_on_local_updates`, that indicates whether to perform a validation on the federated model on the
    node side where model parameters are updated locally after training in each node.

    Args:
        flag (bool, optional): whether to perform model validation on local updates. Defaults to True.

    Returns:
        value of the flag `test_on_local_updates`

    Raises:
        FedbiomedExperimentError: bad flag type
    """
    self._training_args['test_on_local_updates'] = flag
    return self._training_args['test_on_local_updates']

set_test_ratio

set_test_ratio(ratio)

Sets validation ratio for model validation.

When setting test_ratio, nodes will allocate (1 - test_ratio) fraction of data for training and the remaining for validating model. This could be useful for validating the model, once every round, as well as controlling overfitting, doing early stopping, ....

Parameters:

Name	Type	Description	Default
`ratio`	`float`	validation ratio. Must be within interval [0,1].	required

Returns:

Type	Description
`float`	Validation ratio that is set

Raises:

Type	Description
`FedbiomedExperimentError`	bad data type
`FedbiomedExperimentError`	ratio is not within interval [0, 1]

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_test_ratio(self, ratio: float) -> float:
    """ Sets validation ratio for model validation.

    When setting test_ratio, nodes will allocate (1 - `test_ratio`) fraction of data for training and the
    remaining for validating model. This could be useful for validating the model, once every round, as well as
    controlling overfitting, doing early stopping, ....

    Args:
        ratio: validation ratio. Must be within interval [0,1].

    Returns:
        Validation ratio that is set

    Raises:
        FedbiomedExperimentError: bad data type
        FedbiomedExperimentError: ratio is not within interval [0, 1]
    """
    self._training_args['test_ratio'] = ratio
    return ratio

set_training_data

set_training_data(training_data, from_tags=False)

Sets training data for federated training + verification on arguments type

See FederatedWorkflow.set_training_data for more information.

Ensures consistency also with the Experiment's aggregator and node state agent

Setting to None forfeits consistency checks

Setting training_data to None does not trigger consistency checks, and may therefore leave the class in an inconsistent state.

Returns:

Type	Description
`Union[FederatedDataSet, None]`	Dataset metadata

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def set_training_data(
        self,
        training_data: Union[FederatedDataSet, dict, None],
        from_tags: bool = False) -> \
        Union[FederatedDataSet, None]:
    """Sets training data for federated training + verification on arguments type

    See
    [`FederatedWorkflow.set_training_data`][fedbiomed.researcher.federated_workflows.FederatedWorkflow.set_training_data]
    for more information.

    Ensures consistency also with the Experiment's aggregator and node state agent

    !!! warning "Setting to None forfeits consistency checks"
        Setting training_data to None does not trigger consistency checks, and may therefore leave the class in an
        inconsistent state.

    Returns:
        Dataset metadata
    """
    super().set_training_data(training_data, from_tags)
    # Below: Experiment-specific operations for consistency
    if self._aggregator is not None and self._fds is not None:
        # update the aggregator's training data
        self._aggregator.set_fds(self._fds)
    if self._node_state_agent is not None and self._fds is not None:
        # update the node state agent (member of FederatedWorkflow)
        self._node_state_agent.update_node_states(self.all_federation_nodes())
    return self._fds

strategy

strategy()

Retrieves the class that represents the node selection strategy.

Please see also set_strategy to set or update node selection strategy.

Returns:

Type	Description
`Union[Strategy, None]`	A class or object as an instance of `Strategy`. `None` if it is not declared yet. It means that node selection strategy will be `DefaultStrategy`.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def strategy(self) -> Union[Strategy, None]:
    """Retrieves the class that represents the node selection strategy.

    Please see also [`set_strategy`][fedbiomed.researcher.federated_workflows.Experiment.set_strategy]
    to set or update node selection strategy.

    Returns:
        A class or object as an instance of [`Strategy`][fedbiomed.researcher.strategies.Strategy]. `None` if
            it is not declared yet. It means that node selection strategy will be
            [`DefaultStrategy`][fedbiomed.researcher.strategies.DefaultStrategy].
    """
    return self._node_selection_strategy

test_metric

test_metric()

Retrieves the metric for validation routine.

Please see also set_test_metric to change/set test_metric

Returns:

Type	Description
`Union[MetricTypes, str, None]`	A class as an instance of `MetricTypes`. `str` for referring one of metric which provided as attributes in `MetricTypes`. None, if it isn't declared yet.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def test_metric(self) -> Union[MetricTypes, str, None]:
    """Retrieves the metric for validation routine.

    Please see also [`set_test_metric`][fedbiomed.researcher.federated_workflows.Experiment.set_test_metric]
        to change/set `test_metric`

    Returns:
        A class as an instance of [`MetricTypes`][fedbiomed.common.metrics.MetricTypes]. [`str`][str] for referring
            one of  metric which provided as attributes in [`MetricTypes`][fedbiomed.common.metrics.MetricTypes].
            None, if it isn't declared yet.
    """

    return self._training_args['test_metric']

test_metric_args

test_metric_args()

Retrieves the metric argument for the metric function that is going to be used.

Please see also set_test_metric to change/set test_metric and get more information on the arguments can be used.

Returns:

Type	Description
`Dict[str, Any]`	A dictionary that contains arguments for metric function. See `set_test_metric`

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def test_metric_args(self) -> Dict[str, Any]:
    """Retrieves the metric argument for the metric function that is going to be used.

    Please see also [`set_test_metric`][fedbiomed.researcher.federated_workflows.Experiment.set_test_metric]
    to change/set `test_metric` and get more information on the arguments can be used.

    Returns:
        A dictionary that contains arguments for metric function. See [`set_test_metric`]
            [fedbiomed.researcher.federated_workflows.Experiment.set_test_metric]
    """
    return self._training_args['test_metric_args']

test_on_global_updates

test_on_global_updates()

Retrieves the status of whether validation will be performed on globally updated (aggregated) parameters by the nodes at the beginning of each round.

Please see also set_test_on_global_updates.

Returns:

Type	Description
`bool`	True, if validation is active on globally updated (aggregated) parameters. False for vice versa.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def test_on_global_updates(self) -> bool:
    """ Retrieves the status of whether validation will be performed on globally updated (aggregated)
    parameters by the nodes at the beginning of each round.

    Please see also [`set_test_on_global_updates`]
    [fedbiomed.researcher.federated_workflows.Experiment.set_test_on_global_updates].

    Returns:
        True, if validation is active on globally updated (aggregated) parameters. False for vice versa.
    """
    return self._training_args['test_on_global_updates']

test_on_local_updates

test_on_local_updates()

Retrieves the status of whether validation will be performed on locally updated parameters by the nodes at the end of each round.

Please see also set_test_on_local_updates.

Returns:

Type	Description
`bool`	True, if validation is active on locally updated parameters. False for vice versa.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def test_on_local_updates(self) -> bool:
    """Retrieves the status of whether validation will be performed on locally updated parameters by
    the nodes at the end of each round.

    Please see also
        [`set_test_on_local_updates`][fedbiomed.researcher.federated_workflows.Experiment.set_test_on_local_updates].

    Returns:
        True, if validation is active on locally updated parameters. False for vice versa.
    """

    return self._training_args['test_on_local_updates']

test_ratio

test_ratio()

Retrieves the ratio for validation partition of entire dataset.

Please see also set_test_ratio to change/set test_ratio

Returns:

Type	Description
`float`	The ratio for validation part, `1 - test_ratio` is ratio for training set.

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def test_ratio(self) -> float:
    """Retrieves the ratio for validation partition of entire dataset.

    Please see also [`set_test_ratio`][fedbiomed.researcher.federated_workflows.Experiment.set_test_ratio] to
        change/set `test_ratio`

    Returns:
        The ratio for validation part, `1 - test_ratio` is ratio for training set.
    """

    return self._training_args['test_ratio']

training_replies

training_replies()

Retrieves training replies of each round of training.

Training replies contains timing statistics and the files parth/URLs that has been received after each round.

Returns:

Type	Description
`Union[dict, None]`	Dictionary of training replies with format {round (int) : replies (dict)}

Source code in fedbiomed/researcher/federated_workflows/_experiment.py

@exp_exceptions
def training_replies(self) -> Union[dict, None]:
    """Retrieves training replies of each round of training.

    Training replies contains timing statistics and the files parth/URLs that has been received after each round.

    Returns:
        Dictionary of training replies with format {round (int) : replies (dict)}
    """

    return self._training_replies

FederatedWorkflow

FederatedWorkflow(tags=None, nodes=None, training_data=None, experimentation_folder=None, secagg=False, save_breakpoints=False)

Bases: ABC

A FederatedWorkflow is the abstract entry point for the researcher to orchestrate both local and remote operations.

The FederatedWorkflow is an abstract base class from which the actual classes used by the researcher must inherit. It manages the life-cycle of:

the training arguments
secure aggregation
the node state agent

Additionally, it provides the basis for the breakpoint functionality, and manages some backend functionalities such as the temporary directory, the experiment ID, etc...

The attributes training_data and tags are co-dependent. Attempting to modify one of those may result in side effects modifying the other, according to the following rules: - modifying tags if training data is not None will reset the training data based on the new tags - modifying the training data using a FederatedDataset object or a dict will set tags to None

Parameters:

Name	Type	Description	Default
`tags`	`Optional[List[str] \| str]`	list of string with data tags or string with one data tag. Empty list of tags ([]) means any dataset is accepted, it is different from None (tags not set, cannot search for training_data yet).	`None`
`nodes`	`Optional[List[str]]`	list of node_ids to filter the nodes to be involved in the experiment. Defaults to None (no filtering).	`None`
`training_data`	`Union[FederatedDataSet, dict, None]`	If it is a FederatedDataSet object, use this value as training_data. else if it is a dict, create and use a FederatedDataSet object from the dict and use this value as training_data. The dict should use node ids as keys, values being list of dicts (each dict representing a dataset on a node). else if it is None (no training data provided) if `tags` is not None, set training_data by searching for datasets with a query to the nodes using `tags` and `nodes` if `tags` is None, set training_data to None (no training_data set yet, experiment is not fully initialized and cannot be launched) Defaults to None (query nodes for dataset if `tags` is not None, set training_data to None else)	`None`
`save_breakpoints`	`bool`	whether to save breakpoints or not after each training round. Breakpoints can be used for resuming a crashed experiment.	`False`
`experimentation_folder`	`Union[str, None]`	choose a specific name for the folder where experimentation result files and breakpoints are stored. This should just contain the name for the folder not a path. The name is used as a subdirectory of `environ[EXPERIMENTS_DIR])`. Defaults to None (auto-choose a folder name) - Caveat : if using a specific name this experimentation will not be automatically detected as the last experimentation by `load_breakpoint` - Caveat : do not use a `experimentation_folder` name finishing with numbers ([0-9]+) as this would confuse the last experimentation detection heuristic by `load_breakpoint`.	`None`
`secagg`	`Union[bool, SecureAggregation]`	whether to setup a secure aggregation context for this experiment, and use it to send encrypted updates from nodes to researcher. Defaults to `False`	`False`

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def __init__(
    self,
    tags: Optional[List[str] | str] = None,
    nodes: Optional[List[str]] = None,
    training_data: Union[FederatedDataSet, dict, None] = None,
    experimentation_folder: Union[str, None] = None,
    secagg: Union[bool, SecureAggregation] = False,
    save_breakpoints: bool = False,
) -> None:
    """Constructor of the class.

    Args:
        tags: list of string with data tags or string with one data tag. Empty list of
            tags ([]) means any dataset is accepted, it is different from None
            (tags not set, cannot search for training_data yet).

        nodes: list of node_ids to filter the nodes to be involved in the experiment.
            Defaults to None (no filtering).

        training_data:
            * If it is a FederatedDataSet object, use this value as training_data.
            * else if it is a dict, create and use a FederatedDataSet object
                from the dict and use this value as training_data. The dict should use
                node ids as keys, values being list of dicts (each dict representing a
                dataset on a node).
            * else if it is None (no training data provided)
              - if `tags` is not None, set training_data by
                searching for datasets with a query to the nodes using `tags` and `nodes`
              - if `tags` is None, set training_data to None (no training_data set yet,
                experiment is not fully initialized and cannot be launched)
            Defaults to None (query nodes for dataset if `tags` is not None, set training_data
            to None else)
        save_breakpoints: whether to save breakpoints or not after each training
            round. Breakpoints can be used for resuming a crashed experiment.

        experimentation_folder: choose a specific name for the folder
            where experimentation result files and breakpoints are stored. This
            should just contain the name for the folder not a path. The name is used
            as a subdirectory of `environ[EXPERIMENTS_DIR])`. Defaults to None
            (auto-choose a folder name)
            - Caveat : if using a specific name this experimentation will not be
                automatically detected as the last experimentation by `load_breakpoint`
            - Caveat : do not use a `experimentation_folder` name finishing
                with numbers ([0-9]+) as this would confuse the last experimentation
                detection heuristic by `load_breakpoint`.
        secagg: whether to setup a secure aggregation context for this experiment, and
            use it to send encrypted updates from nodes to researcher.
            Defaults to `False`
    """
    # predefine all class variables, so no need to write try/except
    # block each time we use it
    self._fds: Optional[FederatedDataSet] = None  # dataset metadata from the full federation
    self._reqs: Requests = Requests()
    self._nodes_filter: Optional[List[str]] = None  # researcher-defined nodes filter
    self._tags: Optional[List[str]] = None
    self._experimentation_folder: Optional[str] = None
    self._secagg: Optional[SecureAggregation] = None
    self._save_breakpoints: Optional[bool] = None
    self._node_state_agent: Optional[NodeStateAgent] = None
    self._researcher_id: str = environ['RESEARCHER_ID']
    self._experiment_id: str = EXPERIMENT_PREFIX + str(uuid.uuid4())  # creating a unique experiment id

    # set internal members from constructor arguments
    self.set_secagg(secagg)

    # TODO: Manage tags within the FederatedDataset to avoid conflicts
    if training_data is not None and tags is not None:
        msg = f"{ErrorNumbers.FB410.value}: Can not set `training_data` and `tags` at the " \
            "same time. Please provide only `training_data`, or tags to search for " \
            "training data."
        logger.critical(msg)
        raise FedbiomedValueError(msg)

    # Set tags if it tags is not None
    if tags:
        self.set_tags(tags)

    if training_data:
        self.set_training_data(training_data)

    self.set_nodes(nodes)
    self.set_save_breakpoints(save_breakpoints)

    self.set_experimentation_folder(experimentation_folder)
    self._node_state_agent = NodeStateAgent(list(self._fds.data().keys())
                                            if self._fds and self._fds.data() else [])

Attributes

id `property`

id

Retrieves the unique experiment identifier.

secagg `property`

secagg

Gets secagg object SecureAggregation

Returns:

Type	Description
`SecureAggregation`	Secure aggregation object.

Functions

all_federation_nodes

all_federation_nodes()

Returns all the node ids in the federation

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def all_federation_nodes(self) -> List[str]:
    """Returns all the node ids in the federation"""
    return list(self._fds.data().keys()) if self._fds is not None else []

breakpoint

breakpoint(state, bkpt_number)

Saves breakpoint with the state of the workflow.

The following attributes will be saved:

tags
experimentation_folder
training_data
training_args
secagg
node_state

Raises:

Type	Description
`FedbiomedExperimentError`	experiment not fully defined, experiment did not run any round yet, or error when saving breakpoint

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def breakpoint(self,
               state: Dict,
               bkpt_number: int) -> None:
    """
    Saves breakpoint with the state of the workflow.

    The following attributes will be saved:

      - tags
      - experimentation_folder
      - training_data
      - training_args
      - secagg
      - node_state

    Raises:
        FedbiomedExperimentError: experiment not fully defined, experiment did not run any round
            yet, or error when saving breakpoint
    """
    state.update({
        'id': self._experiment_id,
        'breakpoint_version': str(__breakpoints_version__),
        'training_data': self._fds.data(),
        'experimentation_folder': self._experimentation_folder,
        'tags': self._tags,
        'nodes': self._nodes_filter,
        'secagg': self._secagg.save_state_breakpoint(),
        'node_state': self._node_state_agent.save_state_breakpoint()
    })

    # save state into a json file
    breakpoint_path, breakpoint_file_name = \
        choose_bkpt_file(self._experimentation_folder, bkpt_number - 1)
    breakpoint_file_path = os.path.join(breakpoint_path, breakpoint_file_name)
    try:
        with open(breakpoint_file_path, 'w', encoding="UTF-8") as bkpt:
            json.dump(state, bkpt)
        logger.info(f"breakpoint number {bkpt_number - 1} saved at " +
                    os.path.dirname(breakpoint_file_path))
    except (OSError, PermissionError, ValueError, TypeError, RecursionError) as e:
        # - OSError: heuristic for catching open() and write() errors
        # - see json.dump() documentation for documented errors for this call
        msg = ErrorNumbers.FB413.value + f' - save failed with message {str(e)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg) from e

experimentation_folder

experimentation_folder()

Retrieves the folder name where experiment data/result are saved.

Please see also set_experimentation_folder

Returns:

Type	Description
`str`	File name where experiment related files are saved

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def experimentation_folder(self) -> str:
    """Retrieves the folder name where experiment data/result are saved.

    Please see also [`set_experimentation_folder`]
    [fedbiomed.researcher.federated_workflows.FederatedWorkflow.set_experimentation_folder]

    Returns:
        File name where experiment related files are saved
    """

    return self._experimentation_folder

experimentation_path

experimentation_path()

Retrieves the file path where experimentation folder is located and experiment related files are saved.

Returns:

Type	Description
`str`	Experiment directory where all experiment related files are saved

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def experimentation_path(self) -> str:
    """Retrieves the file path where experimentation folder is located and experiment related files are saved.

    Returns:
        Experiment directory where all experiment related files are saved
    """

    return os.path.join(environ['EXPERIMENTS_DIR'], self._experimentation_folder)

filtered_federation_nodes

filtered_federation_nodes()

Returns the node ids in the federation after filtering with the nodes filter

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def filtered_federation_nodes(self) -> List[str]:
    """Returns the node ids in the federation after filtering with the nodes filter"""
    if self._nodes_filter is not None:
        return [node for node in self.all_federation_nodes() if node in self._nodes_filter]
    else:
        return self.all_federation_nodes()

info

info(info=None, missing='')

Prints out the information about the current status of the experiment.

Lists all the parameters/arguments of the experiment and informs whether the experiment can be run.

Parameters:

Name	Type	Description	Default
`info`	`Dict[str, List[str]]`	Dictionary of sub-classes relevant attributes status that will be completed with some additional attributes status defined in this class. Defaults to None (no entries of sub-classes available or of importance).	`None`
`missing_object_to_check`		dictionary mapping sub-classes attributes to attribute names, that may be needed to fully run the object. Defaults to None (no check will be performed).	required

Returns:

Type	Description
`Dict[str, List[str]]`	dictionary containing all pieces of information, with 2 entries: `Arguments` mapping a list
`str`	of all argument, and `Values` mapping a list containing all the values.

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def info(self,
         info: Dict[str, List[str]] = None,
         missing: str = '') -> Tuple[Dict[str, List[str]], str]:
    """Prints out the information about the current status of the experiment.

    Lists  all the parameters/arguments of the experiment and informs whether the experiment can be run.

    Args:
        info: Dictionary of sub-classes relevant attributes status that will be completed with some additional
            attributes status defined in this class. Defaults to None (no entries of sub-classes available or
            of importance).
        missing_object_to_check: dictionary mapping sub-classes attributes to attribute names, that may be
            needed to fully run the object. Defaults to None (no check will be performed).

    Returns:
        dictionary containing all pieces of information, with 2 entries: `Arguments` mapping a list
        of all argument, and `Values` mapping a list containing all the values.
    """
    if info is None:
        info = self._create_default_info_structure()
    info['Arguments'].extend([
        'Tags',
        'Nodes filter',
        'Training Data',
        'Experiment folder',
        'Experiment Path',
        'Secure Aggregation'
    ])

    info['Values'].extend(['\n'.join(findall('.{1,60}', str(e))) for e in [
        self._tags,
        self._nodes_filter,
        self._fds,
        self._experimentation_folder,
        self.experimentation_path(),
        f'- Using: {self._secagg}\n- Active: {self._secagg.active}'
    ]])

    # printing list of items set / not set yet
    print(tabulate.tabulate(info, headers='keys'))

    if missing:
        print("\nWarning: Object not fully defined, missing"
              f": \n{missing}")
    else:
        print(f"{self.__class__.__name__} can be run now (fully defined)")
    return info, missing

load_breakpoint `classmethod`

load_breakpoint(breakpoint_folder_path=None)

Loads breakpoint (provided a breakpoint has been saved) so the workflow can be resumed.

Parameters:

Name	Type	Description	Default
`breakpoint_folder_path`	`Optional[str]`	path of the breakpoint folder. Path can be absolute or relative eg: "var/experiments/Experiment_xxxx/breakpoints_xxxx". If None, loads the latest breakpoint of the latest workflow. Defaults to None.	`None`

Returns:

Type	Description
`Tuple[TFederatedWorkflow, dict]`	Tuple contaning reinitialized workflow object and the saved state as a dictionary

Raises:

Type	Description
`FedbiomedExperimentError`	bad argument type, error when reading breakpoint or bad loaded breakpoint content (corrupted)

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@classmethod
@exp_exceptions
def load_breakpoint(
    cls,
    breakpoint_folder_path: Optional[str] = None
) -> Tuple[TFederatedWorkflow, dict]:
    """
    Loads breakpoint (provided a breakpoint has been saved)
    so the workflow can be resumed.

    Args:
      breakpoint_folder_path: path of the breakpoint folder. Path can be absolute
        or relative eg: "var/experiments/Experiment_xxxx/breakpoints_xxxx".
        If None, loads the latest breakpoint of the latest workflow. Defaults to None.

    Returns:
        Tuple contaning reinitialized workflow object and the saved state as a dictionary

    Raises:
        FedbiomedExperimentError: bad argument type, error when reading breakpoint or
            bad loaded breakpoint content (corrupted)
    """
    # check parameters type
    if not isinstance(breakpoint_folder_path, str) and breakpoint_folder_path is not None:
        msg = (
            f"{ErrorNumbers.FB413.value}: load failed, `breakpoint_folder_path`"
            f" has bad type {type(breakpoint_folder_path)}"
        )
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    # get breakpoint folder path (if it is None) and state file
    breakpoint_folder_path, state_file = find_breakpoint_path(breakpoint_folder_path)
    breakpoint_folder_path = os.path.abspath(breakpoint_folder_path)

    try:
        path = os.path.join(breakpoint_folder_path, state_file)
        with open(path, "r", encoding="utf-8") as file:
            saved_state = json.load(file)
    except (json.JSONDecodeError, OSError) as exc:
        # OSError: heuristic for catching file access issues
        msg = (
            f"{ErrorNumbers.FB413.value}: load failed,"
            f" reading breakpoint file failed with message {exc}"
        )
        logger.critical(msg)
        raise FedbiomedExperimentError(msg) from exc
    if not isinstance(saved_state, dict):
        msg = (
            f"{ErrorNumbers.FB413.value}: load failed, breakpoint file seems"
            f" corrupted. Type should be `dict` not {type(saved_state)}"
        )
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    # First, check version of breakpoints
    bkpt_version = saved_state.get('breakpoint_version', __default_version__)
    raise_for_version_compatibility(bkpt_version, __breakpoints_version__,
                                    f"{ErrorNumbers.FB413.value}: Breakpoint "
                                    "file was generated with version %s "
                                    f"which is incompatible with the current version %s.")

    # retrieve breakpoint training data
    bkpt_fds = saved_state.get('training_data')
    bkpt_fds = FederatedDataSet(bkpt_fds)

    # initializing experiment
    loaded_exp = cls()
    loaded_exp._experiment_id = saved_state.get('id')
    loaded_exp.set_training_data(bkpt_fds)
    loaded_exp._tags = saved_state.get('tags')
    loaded_exp.set_nodes(saved_state.get('nodes'))
    loaded_exp.set_experimentation_folder(saved_state.get('experimentation_folder'))
    loaded_exp.set_secagg(SecureAggregation.load_state_breakpoint(saved_state.get('secagg')))
    loaded_exp._node_state_agent.load_state_breakpoint(saved_state.get('node_state'))
    loaded_exp.set_save_breakpoints(True)

    return loaded_exp, saved_state

nodes

nodes()

Retrieves the nodes filter for the execution of the workflow.

If nodes is None, then no filtering is applied, and all the nodes in the federation participate in the execution of the workflow. If nodes is not None, then the semantics of the nodes filter are as follows:

node_id in nodes filter	node_id in training data	outcome
yes	yes	this node is part of the federation, and will take part in the execution the workflow
yes	no	ignored
no	yes	this node is part of the federation but will not be considered for executing the workflow
no	no	ignored

Please see set_nodes to set nodes.

Returns:

Type	Description
`Union[List[str], None]`	The list of nodes to keep for workflow execution, or None if no filtering is applied

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def nodes(self) -> Union[List[str], None]:
    """Retrieves the nodes filter for the execution of the workflow.

    If nodes is None, then no filtering is applied, and all the nodes in the federation participate in the
    execution of the workflow.
    If nodes is not None, then the semantics of the nodes filter are as follows:

    | node_id in nodes filter | node_id in training data | outcome |
    | --- | --- | --- |
    | yes | yes | this node is part of the federation, and will take part in the execution the workflow |
    | yes | no | ignored |
    | no | yes | this node is part of the federation but will not be considered for executing the workflow |
    | no | no | ignored |

    Please see [`set_nodes`][fedbiomed.researcher.federated_workflows.FederatedWorkflow.set_nodes] to set `nodes`.

    Returns:
        The list of nodes to keep for workflow execution, or None if no filtering is applied
    """
    return self._nodes_filter

run `abstractmethod`

run()

Run the experiment

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@abstractmethod
def run(self) -> int:
    """Run the experiment"""

save_breakpoints

save_breakpoints()

Retrieves the status of saving breakpoint after each round of training.

Returns:

Type	Description
`bool`	`True`, If saving breakpoint is active. `False`, vice versa.

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def save_breakpoints(self) -> bool:
    """Retrieves the status of saving breakpoint after each round of training.

    Returns:
        `True`, If saving breakpoint is active. `False`, vice versa.
    """

    return self._save_breakpoints

secagg_setup

secagg_setup(sampled_nodes)

Retrieves the secagg arguments for setup.

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

def secagg_setup(self, sampled_nodes: List[str]) -> Dict:
    """Retrieves the secagg arguments for setup."""
    secagg_arguments = {}
    if self._secagg.active:
        self._secagg.setup(parties=[environ["ID"]] + sampled_nodes,
                           experiment_id=self._experiment_id)
        secagg_arguments = self._secagg.train_arguments()
    return secagg_arguments

set_experimentation_folder

set_experimentation_folder(experimentation_folder=None)

Sets experimentation_folder, the folder name where experiment data/result are saved.

Parameters:

Name	Type	Description	Default
`experimentation_folder`	`Optional[str]`	File name where experiment related files are saved	`None`

Returns:

Type	Description
`str`	The path to experimentation folder.

Raises:

Type	Description
`FedbiomedExperimentError`	bad `experimentation_folder` type

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def set_experimentation_folder(self, experimentation_folder: Optional[str] = None) -> str:
    """Sets `experimentation_folder`, the folder name where experiment data/result are saved.

    Args:
        experimentation_folder: File name where experiment related files are saved

    Returns:
        The path to experimentation folder.

    Raises:
        FedbiomedExperimentError : bad `experimentation_folder` type
    """
    if experimentation_folder is None:
        self._experimentation_folder = create_exp_folder()
    elif isinstance(experimentation_folder, str):
        sanitized_folder = sanitize_filename(experimentation_folder, platform='auto')
        self._experimentation_folder = create_exp_folder(sanitized_folder)
        if sanitized_folder != experimentation_folder:
            logger.warning(f'`experimentation_folder` was sanitized from '
                           f'{experimentation_folder} to {sanitized_folder}')
    else:
        msg = ErrorNumbers.FB410.value + \
            f' `experimentation_folder` : {type(experimentation_folder)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

        # at this point self._experimentation_folder is a str valid for a foldername

    return self._experimentation_folder

set_nodes

set_nodes(nodes)

Sets the nodes filter + verifications on argument type

Parameters:

Name	Type	Description	Default
`nodes`	`Union[List[str], None]`	List of node_ids to filter the nodes to be involved in the experiment.	required

Returns:

Type	Description
`Union[List[str], None]`	List of nodes that are set. None, if the argument `nodes` is None.

Raises:

Type	Description
`FedbiomedTypeError`	Bad nodes type

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def set_nodes(self, nodes: Union[List[str], None]) -> Union[List[str], None]:
    """Sets the nodes filter + verifications on argument type

    Args:
        nodes: List of node_ids to filter the nodes to be involved in the experiment.

    Returns:
        List of nodes that are set. None, if the argument `nodes` is None.

    Raises:
        FedbiomedTypeError : Bad nodes type
    """
    # immediately exit if setting nodes to None
    if nodes is None:
        self._nodes_filter = None
    # set nodes
    elif isinstance(nodes, list):
        if not all(map(lambda node: isinstance(node, str), nodes)):
            msg = ErrorNumbers.FB410.value + ' `nodes` argument must be a list of strings or None.'
            logger.critical(msg)
            raise FedbiomedTypeError(msg)
        self._nodes_filter = nodes
    else:
        msg = ErrorNumbers.FB410.value + ' `nodes` argument must be a list of strings or None.'
        logger.critical(msg)
        raise FedbiomedTypeError(msg)
    return self._nodes_filter

set_save_breakpoints

set_save_breakpoints(save_breakpoints)

Setter for save_breakpoints + verification on arguments type

Parameters:

Name	Type	Description	Default
`save_breakpoints`	`bool`	whether to save breakpoints or not after each training round. Breakpoints can be used for resuming a crashed experiment.	required

Returns:

Type	Description
`bool`	Status of saving breakpoints

Raises:

Type	Description
`FedbiomedExperimentError`	bad save_breakpoints type

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def set_save_breakpoints(self, save_breakpoints: bool) -> bool:
    """ Setter for save_breakpoints + verification on arguments type

    Args:
        save_breakpoints (bool): whether to save breakpoints or
            not after each training round. Breakpoints can be used for resuming
            a crashed experiment.

    Returns:
        Status of saving breakpoints

    Raises:
        FedbiomedExperimentError: bad save_breakpoints type
    """
    if isinstance(save_breakpoints, bool):
        self._save_breakpoints = save_breakpoints
        # no warning if done during experiment, we may change breakpoint policy at any time
    else:
        msg = ErrorNumbers.FB410.value + f' `save_breakpoints` : {type(save_breakpoints)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    return self._save_breakpoints

set_secagg

set_secagg(secagg)

Sets secure aggregation

Build secure aggregation controller/instance or sets given secure aggregation class

Parameters:

Name	Type	Description	Default
`secagg`	`Union[bool, SecureAggregation]`	If True activates training request with secure aggregation by building `SecureAggregation` class with default arguments. Or if argument is an instance of `SecureAggregation` it does only assignment. Secure aggregation activation and configuration depends on the instance provided.	required

Returns:

Type	Description
	Secure aggregation controller instance.

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def set_secagg(self, secagg: Union[bool, SecureAggregation]):
    """Sets secure aggregation

    Build secure aggregation controller/instance or sets given
    secure aggregation class

    Args:
        secagg: If True activates training request with secure aggregation by building
            [`SecureAggregation`][fedbiomed.researcher.secagg.SecureAggregation] class
            with default arguments. Or if argument is an instance of `SecureAggregation`
            it does only assignment. Secure aggregation activation and configuration
            depends on the instance provided.

    Returns:
        Secure aggregation controller instance.
    """
    if isinstance(secagg, bool):
        self._secagg = SecureAggregation(active=secagg)
    elif isinstance(secagg, SecureAggregation):
        self._secagg = secagg
    else:
        msg = f"{ErrorNumbers.FB410.value}: Expected `secagg` argument bool or `SecureAggregation`, " \
              f"but got {type(secagg)}"
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    return self._secagg

set_tags

set_tags(tags)

Sets tags and verification on argument type

Setting tags updates also training data by executing [set_training_data].[fedbiomed.researcher.federated_workflows.FederatedWorkflow.set_training_data] method.

Parameters:

Name	Type	Description	Default
`tags`	`Union[List[str], str]`	List of string with data tags or string with one data tag. Empty list of tags ([]) means any dataset is accepted, it is different from None (tags not set, cannot search for training_data yet).	required

Returns: List of tags that are set.

Raises:

Type	Description
`FedbiomedTypeError`	Bad tags type
`FedbiomedValueError`	Some issue prevented resetting the training data after an inconsistency was detected

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def set_tags(
    self,
    tags: Union[List[str], str],
) -> List[str]:
    """Sets tags and verification on argument type

    Setting tags updates also training data by executing
    [`set_training_data`].[fedbiomed.researcher.federated_workflows.FederatedWorkflow.set_training_data]
    method.

    Args:
        tags: List of string with data tags or string with one data tag. Empty list
            of tags ([]) means any dataset is accepted, it is different from None
            (tags not set, cannot search for training_data yet).
    Returns:
        List of tags that are set.

    Raises:
        FedbiomedTypeError: Bad tags type
        FedbiomedValueError: Some issue prevented resetting the training
            data after an inconsistency was detected
    """
    # preprocess the tags argument to correct typing
    if not tags:
        msg = f"{ErrorNumbers.FB410.value}: Invalid value for tags argument {tags}, tags " \
            "should be non-empty list of str or non-empty str."
        logger.critical(msg)
        raise FedbiomedValueError(msg)

    if isinstance(tags, list):
        if not all(map(lambda tag: isinstance(tag, str), tags)):
            msg = f"{ErrorNumbers.FB410.value}: `tags` must be a non-empty str or " \
                "a non-empty list of str."
            logger.critical(msg)
            raise FedbiomedTypeError(msg)

        # If it is empty list
        tags_to_set = tags

    elif isinstance(tags, str):
        tags_to_set = [tags]
    else:
        msg = f"{ErrorNumbers.FB410.value} `tags` must be a non-empty str, " \
            "a non-empty list of str"
        logger.critical(msg)
        raise FedbiomedTypeError(msg)

    self._tags = tags_to_set

    # Set training data
    logger.info(
        "Updating training data. This action will update FederatedDataset, "
        "and the nodes that will participate to the experiment.")

    self.set_training_data(None, from_tags=True)

    return self._tags

set_training_data

set_training_data(training_data, from_tags=False)

Sets training data for federated training + verification on arguments type

The full expected behaviour when changing training data is given in the table below:

New value of `training_data`	`from_tags`	Outcome
dict or FederatedDataset	True	fail because user is attempting to set from tags but also providing a training_data argument
dict or FederatedDataset	False	set fds attribute, set tags to None
None	True	fail if tags are not set, else set fds attribute based tags
None	False	set tags to None and keep same value and tags

Setting to None forfeits consistency checks

Setting training_data to None does not trigger consistency checks, and may therefore leave the class in an inconsistent state.

Parameters:

Name	Type	Description	Default
`training_data`	`Union[FederatedDataSet, dict, None]`	If it is a FederatedDataSet object, use this value as training_data. else if it is a dict, create and use a FederatedDataSet object from the dict and use this value as training_data. The dict should use node ids as keys, values being list of dicts (each dict representing a dataset on a node). else if it is None (no training data provided) if `from_tags` is True and `tags` is not None, set training_data by searching for datasets with a query to the nodes using `tags` and `nodes` if `from_tags` is False or `tags` is None, set training_data to None (no training_data set yet, experiment is not fully initialized and cannot be launched)	required
`from_tags`	`bool`	If True, query nodes for datasets when no `training_data` is provided. Not used when `training_data` is provided.	`False`

Returns:

Type	Description
`Union[FederatedDataSet, None]`	FederatedDataSet metadata

Raises:

Type	Description
`FedbiomedTypeError`	bad training_data or from_tags type.
`FedbiomedValueError`	Invalid value for the arguments `training_data` or `from_tags`.

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def set_training_data(
        self,
        training_data: Union[FederatedDataSet, dict, None],
        from_tags: bool = False) -> \
        Union[FederatedDataSet, None]:
    """Sets training data for federated training + verification on arguments type


    The full expected behaviour when changing training data is given in the table below:

    | New value of `training_data` | `from_tags` | Outcome |
    | --- | --- | --- |
    | dict or FederatedDataset | True  | fail because user is attempting to set from tags but also providing a training_data argument|
    | dict or FederatedDataset | False | set fds attribute, set tags to None |
    | None | True | fail if tags are not set, else set fds attribute based tags |
    | None | False | set tags to None and keep same value and tags |

    !!! warning "Setting to None forfeits consistency checks"
        Setting training_data to None does not trigger consistency checks, and may therefore leave the class in an
        inconsistent state.

    Args:
        training_data:
            * If it is a FederatedDataSet object, use this value as training_data.
            * else if it is a dict, create and use a FederatedDataSet object from the dict
              and use this value as training_data. The dict should use node ids as keys,
              values being list of dicts (each dict representing a dataset on a node).
            * else if it is None (no training data provided)
              - if `from_tags` is True and `tags` is not None, set training_data by
                searching for datasets with a query to the nodes using `tags` and `nodes`
              - if `from_tags` is False or `tags` is None, set training_data to None (no training_data set yet,
                experiment is not fully initialized and cannot be launched)
        from_tags: If True, query nodes for datasets when no `training_data` is provided.
            Not used when `training_data` is provided.

    Returns:
        FederatedDataSet metadata

    Raises:
        FedbiomedTypeError: bad training_data or from_tags type.
        FedbiomedValueError: Invalid value for the arguments  `training_data` or `from_tags`.
    """

    if not isinstance(from_tags, bool):
        msg = ErrorNumbers.FB410.value + \
            f' `from_tags` : got {type(from_tags)} but expected a boolean'
        logger.critical(msg)
        raise FedbiomedTypeError(msg)
    if from_tags and training_data is not None:
        msg = ErrorNumbers.FB410.value + \
            ' set_training_data: cannot specify a training_data argument if ' \
            'from_tags is True'
        logger.critical(msg)
        raise FedbiomedValueError(msg)

    # case where no training data are passed
    if training_data is None:
        if from_tags is True:
            if not self._tags:
                msg = f"{ErrorNumbers.FB410.value}: attempting to " \
                    "set training data from undefined tags. Please consider set tags before " \
                    "using set_tags method of the experiment."
                logger.critical(msg)
                raise FedbiomedValueError(msg)
            training_data = self._reqs.search(self._tags, self._nodes_filter)
        else:
            msg = f"{ErrorNumbers.FB410.value}: Can not set training data to `None`. " \
                "Please set from_tags=True or provide a valid training data"
            logger.critical(msg)
            raise FedbiomedValueError(msg)

    if isinstance(training_data, FederatedDataSet):
        self._fds = training_data
    elif isinstance(training_data, dict):
        self._fds = FederatedDataSet(training_data)
    else:
        msg = ErrorNumbers.FB410.value + \
            f' `training_data` has incorrect type: {type(training_data)}'
        logger.critical(msg)
        raise FedbiomedTypeError(msg)

    # check and ensure consistency
    self._tags = self._tags if from_tags else None

    # return the new value
    return self._fds

training_data

training_data()

Retrieves the training data which is an instance of FederatedDataset

This represents the dataset metadata available for the full federation.

Please see set_training_data to set or update training data.

Returns:

Type	Description
`Union[FederatedDataSet, None]`	Object that contains metadata for the datasets of each node. `None` if it isn't set yet.

Source code in fedbiomed/researcher/federated_workflows/_federated_workflow.py

@exp_exceptions
def training_data(self) -> Union[FederatedDataSet, None]:
    """Retrieves the training data which is an instance of
    [`FederatedDataset`][fedbiomed.researcher.datasets.FederatedDataSet]

    This represents the dataset metadata available for the full federation.

    Please see [`set_training_data`][fedbiomed.researcher.federated_workflows.FederatedWorkflow.set_training_data]
    to set or update training data.

    Returns:
        Object that contains metadata for the datasets of each node. `None` if it isn't set yet.
    """
    return self._fds

TrainingPlanWorkflow

TrainingPlanWorkflow(*args, training_plan_class=None, training_args=None, model_args=None, **kwargs)

Bases: FederatedWorkflow, ABC

A TrainingPlanWorkflow is an abstract entry point to orchestrate an experiment which uses a training plan.

In addition to the functionalities provided by FederatedWorkflow, the TrainingPlanWorkflow also manages the life-cycle of the training plan.

Use set_training_plan_class to manage the training plan

Please only ever use the set_training_plan_class function to manage the training plan. Do not set the training plan or training plan class directly!

Parameters:

Name	Type	Description	Default
`training_plan_class`	`Optional[TrainingPlanT]`	training plan class to be used for training. For experiment to be properly and fully defined `training_plan_class` needs to be a `TrainingPlanT` Defaults to None (no training plan class defined yet.	`None`
`model_args`	`Optional[Dict]`	contains model arguments passed to the constructor of the training plan when instantiating it : output and input feature dimension, etc.	`None`
`training_args`	`Optional[Union[TrainingArgs, dict]]`	contains training arguments passed to the `training_routine` of the training plan when launching it: lr, epochs, batch_size...	`None`
`*args`		Extra positional arguments from parent class `FederatedWorkflow`	`()`
`**kwargs`		Arguments of parent class `FederatedWorkflow`	`{}`

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def __init__(
    self,
    *args,
    training_plan_class: Optional[TrainingPlanT] = None,
    training_args: Optional[Union[TrainingArgs, dict]] = None,
    model_args: Optional[Dict] = None,
    **kwargs,
) -> None:
    """Constructor of the class.

    Args:
        training_plan_class: training plan class to be used for training.
            For experiment to be properly and fully defined `training_plan_class`
            needs to be a `TrainingPlanT` Defaults to None (no training plan class
            defined yet.
        model_args: contains model arguments passed to the constructor
            of the training plan when instantiating it :
            output and input feature dimension, etc.
        training_args: contains training arguments passed to the `training_routine`
            of the training plan when launching it: lr, epochs, batch_size...
        *args: Extra positional arguments from parent class
            [`FederatedWorkflow`][fedbiomed.researcher.federated_workflows.FederatedWorkflow]
        **kwargs: Arguments of parent class
            [`FederatedWorkflow`][fedbiomed.researcher.federated_workflows.FederatedWorkflow]
    """
    # Check arguments
    if training_plan_class is not None and not inspect.isclass(training_plan_class):
        raise FedbiomedTypeError(
            f"{ErrorNumbers.FB410.value}: bad type for argument "
            f"`training_plan_class` {type(training_plan_class)}")

    if training_plan_class is not None and \
            not issubclass(training_plan_class, TRAINING_PLAN_TYPES):

        raise FedbiomedTypeError(
            f"{ErrorNumbers.FB410.value}: bad type for argument `training_plan_class`."
            f" It is not subclass of supported training plans {TRAINING_PLAN_TYPES}")

    # _training_plan_class determines the life-cycle of the training plan:
    # if training_plass_class changes, then the training plan must be reinitialized
    self._training_plan_class = None
    # model args is also tied to the life-cycle of training plan:
    # if model_args changes, the training plan must be reinitialized
    self._model_args = None
    # The _training_plan attribute represents the *actual instance*
    # of a _training_plan_class that is currently
    # being used in the workflow. The training plan cannot be modified by the user.
    self._training_plan = None
    self._training_args: Optional[TrainingArgs] = None  # FIXME: is it ok to have this here?
    # The _training_plan_file attribute represents the path of the file where the training plan is saved.
    # It cannot be modified by the user
    self._training_plan_file = None

    # initialize object
    super().__init__(*args, **kwargs)

    self.set_training_args(training_args)
    self.set_model_args(model_args)
    self.set_training_plan_class(training_plan_class)

Functions

breakpoint

breakpoint(state, bkpt_number)

Saves breakpoint with the state of the workflow.

The following attributes will be saved:

training_args
training_plan_class
model_args

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def breakpoint(self,
               state,
               bkpt_number) -> None:
    """
    Saves breakpoint with the state of the workflow.

    The following attributes will be saved:

      - training_args
      - training_plan_class
      - model_args
    """
    # save training plan to file
    training_plan_module = 'model_' + str(uuid.uuid4())
    training_plan_file = os.path.join(self.experimentation_path(), training_plan_module + '.py')
    self.training_plan().save_code(training_plan_file)

    state.update({
        'model_args': self._model_args,
        'training_plan_class_name': self._training_plan_class.__name__,
        'training_args': self._training_args.get_state_breakpoint(),
    })

    breakpoint_path, breakpoint_file_name = \
        choose_bkpt_file(self._experimentation_folder, bkpt_number - 1)

    # rewrite paths in breakpoint : use the links in breakpoint directory
    state['training_plan_path'] = create_unique_link(
        breakpoint_path,
        # - Need a file with a restricted characters set in name to be able to import as module
        'model_' + str("{:04d}".format(bkpt_number - 1)), '.py',
        # - Prefer relative path, eg for using experiment result after
        # experiment in a different tree
        os.path.join('..', os.path.basename(training_plan_file))
    )
    params_path = os.path.join(breakpoint_path, f"model_params_{uuid.uuid4()}.mpk")
    Serializer.dump(self.training_plan().get_model_wrapper_class().get_weights(
        only_trainable = False, exclude_buffers = False), params_path)
    state['model_weights_path'] = params_path

    super().breakpoint(state, bkpt_number)

check_training_plan_status

check_training_plan_status()

Method for checking training plan status, ie whether it is approved or not by the nodes

Raises:

Type	Description
`FedbiomedExperimentError`	if the training data is not defined.

Returns:

Type	Description
`Dict`	Training plan status for answering nodes

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def check_training_plan_status(self) -> Dict:
    """ Method for checking training plan status, ie whether it is approved or not by the nodes

    Raises:
        FedbiomedExperimentError: if the training data is not defined.

    Returns:
        Training plan status for answering nodes
    """
    if self.training_data() is None:
        msg = f"{ErrorNumbers.FB410.value}. Cannot check training plan status: training data is not defined." \
              f"Please either use the `set_tags` or `set_training_data` method to fix this."
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    job = TrainingPlanCheckJob(
        nodes=self.training_data().node_ids(),
        keep_files_dir=self.experimentation_path(),
        experiment_id=self._experiment_id,
        training_plan=self.training_plan()
    )
    responses = job.execute()
    return responses

info

info(info=None, missing='')

Prints out the information about the current status of the experiment.

Lists all the parameters/arguments of the experiment and informs whether the experiment can be run.

Parameters:

Name	Type	Description	Default
`info`	`Optional[Dict]`	Dictionary of sub-classes relevant attributes status that will be completed with some additional attributes status defined in this class. Defaults to None (no entries of sub-classes available or of importance).	`None`
`missing_object_to_check`		dictionary mapping sub-classes attributes to attribute names, that may be needed to fully run the object. Defaults to None (no check will be performed).	required

Returns:

Type	Description
`Tuple[Dict[str, List[str]], str]`	dictionary containing all pieces of information, with 2 entries: `Arguments` mapping a list of all argument, and `Values` mapping a list copntaining all the values.

Raises:

Type	Description
`KeyError`	if `Arguments` or `Values` entry is missing in passing argument `info`

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def info(
    self,
    info: Optional[Dict] = None,
    missing: str = ''
) -> Tuple[Dict[str, List[str]], str]:
    """Prints out the information about the current status of the experiment.

    Lists  all the parameters/arguments of the experiment and informs whether
    the experiment can be run.

    Args:
        info: Dictionary of sub-classes relevant attributes status that will be
            completed with some additional attributes status defined in this class.
            Defaults to None (no entries of sub-classes available or of importance).
        missing_object_to_check: dictionary mapping sub-classes attributes to
            attribute names, that may be needed to fully run the object. Defaults
            to None (no check will be performed).

    Returns:
        dictionary containing all pieces of information, with 2 entries:
            `Arguments` mapping a list of all argument, and `Values` mapping
            a list copntaining all the values.

    Raises:
        KeyError: if `Arguments` or `Values` entry is missing in passing argument `info`
    """
    # at this point all attributes are initialized (in constructor)
    if info is None:
        info = self._create_default_info_structure()
    info['Arguments'].extend([
        'Training Plan Class',
        'Model Arguments',
        'Training Arguments'
    ])
    info['Values'].extend(['\n'.join(findall('.{1,60}',
                                     str(e))) for e in [
        self._training_plan_class,
        self._model_args,
        self._training_args
    ]])

    return super().info(info, missing)

load_breakpoint `classmethod`

load_breakpoint(breakpoint_folder_path=None)

Loads breakpoint (provided a breakpoint has been saved) so the workflow can be resumed.

Parameters:

Name	Type	Description	Default
`breakpoint_folder_path`	`Optional[str]`	path of the breakpoint folder. Path can be absolute or relative eg: "var/experiments/Experiment_xxxx/breakpoints_xxxx". If None, loads the latest breakpoint of the latest workflow. Defaults to None.	`None`

Returns:

Type	Description
`Tuple[TrainingPlanWorkflowT, dict]`	Reinitialized workflow object.

Raises:

Type	Description
`FedbiomedExperimentError`	bad argument type, error when reading breakpoint or bad loaded breakpoint content (corrupted)

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@classmethod
@exp_exceptions
def load_breakpoint(cls,
                    breakpoint_folder_path: Optional[str] = None) -> Tuple[TrainingPlanWorkflowT, dict]:
    """
    Loads breakpoint (provided a breakpoint has been saved)
    so the workflow can be resumed.

    Args:
      breakpoint_folder_path: path of the breakpoint folder. Path can be absolute or relative eg:
        "var/experiments/Experiment_xxxx/breakpoints_xxxx". If None, loads the latest breakpoint of the latest
        workflow. Defaults to None.

    Returns:
        Reinitialized workflow object.

    Raises:
        FedbiomedExperimentError: bad argument type, error when reading breakpoint or bad loaded breakpoint
            content (corrupted)
    """
    loaded_exp, saved_state = super().load_breakpoint(breakpoint_folder_path)

    # Define type for pylint
    loaded_exp: TrainingPlanWorkflow

    # Import TP class
    _, tp_class = import_class_from_file(
        module_path=saved_state.get("training_plan_path"),
        class_name=saved_state.get("training_plan_class_name")
    )

    loaded_exp.set_model_args(saved_state["model_args"])
    loaded_exp.set_training_plan_class(tp_class)
    loaded_exp.set_training_args(
        TrainingArgs.load_state_breakpoint(
            saved_state.get('training_args')))
    training_plan = loaded_exp.training_plan()
    if training_plan is None:
        msg = ErrorNumbers.FB413.value + ' - load failed, ' + \
            'breakpoint file seems corrupted, `training_plan` is None'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)
    param_path = saved_state['model_weights_path']
    params = Serializer.load(param_path)
    loaded_exp.training_plan().get_model_wrapper_class().set_weights(params)

    return loaded_exp, saved_state

model_args

model_args()

Retrieves model arguments.

Please see also set_model_args

Returns:

Type	Description
`dict`	The arguments that are going to be passed to the `init_model` function of the training plan during
`dict`	initialization of the model instance

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def model_args(self) -> dict:
    """Retrieves model arguments.

    Please see also [`set_model_args`][fedbiomed.researcher.federated_workflows.TrainingPlanWorkflow.set_model_args]

    Returns:
        The arguments that are going to be passed to the `init_model` function of the training plan during
        initialization of the model instance
    """
    return self._model_args

set_model_args

set_model_args(model_args, keep_weights=True)

Sets model_args + verification on arguments type

Resets the training plan

This function has an important (and intended!) side-effect: it resets the training_plan attribute. By default, it tries to keep the same weights as the current training plan, if available.

Parameters:

Name	Type	Description	Default
`model_args`	`dict`	contains model arguments passed to the constructor of the training plan when instantiating it : output and input feature dimension, etc.	required
`keep_weights`	`bool`	try to keep the same weights as the current training plan	`True`

Returns:

Type	Description
`dict`	Model arguments that have been set.

Raises:

Type	Description
`FedbiomedExperimentError`	bad model_args type

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def set_model_args(self,
                   model_args: dict,
                   keep_weights: bool = True) -> dict:
    """Sets `model_args` + verification on arguments type

    !!! warning "Resets the training plan"
        This function has an important (and intended!) side-effect: it resets the `training_plan` attribute.
        By default, it tries to keep the same weights as the current training plan, if available.

    Args:
        model_args (dict): contains model arguments passed to the constructor
            of the training plan when instantiating it : output and input feature
            dimension, etc.
        keep_weights: try to keep the same weights as the current training plan

    Returns:
        Model arguments that have been set.

    Raises:
        FedbiomedExperimentError : bad model_args type
    """
    if model_args is None or isinstance(model_args, dict):
        self._model_args = model_args
    else:
        # bad type
        msg = ErrorNumbers.FB410.value + f' `model_args` : {type(model_args)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)
    # self._model_args always exist at this point

    self._update_training_plan(keep_weights)  # resets the training plan attribute

    return self._model_args

set_training_args

set_training_args(training_args)

Sets training_args + verification on arguments type

Parameters:

Name	Type	Description	Default
`training_args`	`Union[dict, TrainingArgs, None]`	contains training arguments passed to the training plan's `training_routine` such as lr, epochs, batch_size...	required

Returns:

Type	Description
`Union[dict, None]`	Training arguments

Raises:

Type	Description
`FedbiomedExperimentError`	bad training_args type

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def set_training_args(
    self,
    training_args: Union[dict, TrainingArgs, None]
) -> Union[dict, None]:
    """ Sets `training_args` + verification on arguments type

    Args:
        training_args: contains training arguments passed to the
            training plan's `training_routine` such as lr, epochs, batch_size...

    Returns:
        Training arguments

    Raises:
        FedbiomedExperimentError : bad training_args type
    """

    if isinstance(training_args, TrainingArgs):
        self._training_args = deepcopy(training_args)
    elif isinstance(training_args, dict) or training_args is None:
        self._training_args = TrainingArgs(training_args, only_required=False)
    else:
        msg = f"{ErrorNumbers.FB410.value} in function `set_training_args`. " \
              "Expected type TrainingArgs, dict, or " \
              f"None, got {type(training_args)} instead."
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    return self._training_args.dict()

set_training_plan_class

set_training_plan_class(training_plan_class, keep_weights=True)

Sets the training plan type + verification on arguments type

Resets the training plan

This function has an important (and intended!) side-effect: it resets the training_plan attribute. By default, it tries to keep the same weights as the current training plan, if available.

Parameters:

Name	Type	Description	Default
`training_plan_class`	`Union[TrainingPlanT, None]`	training plan class to be used for training. For experiment to be properly and fully defined `training_plan_class` needs to be a `TrainingPlanT` Defaults to None (no training plan class defined yet)	required
`keep_weights`	`bool`	try to keep the same weights as the current training plan	`True`

Returns:

Type	Description
`Union[TrainingPlanT, None]`	`training_plan_class` that is set for experiment

Raises:

Type	Description
`FedbiomedExperimentError`	bad training_plan_class type

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def set_training_plan_class(self,
                            training_plan_class: Union[TrainingPlanT, None],
                            keep_weights: bool = True
                            ) -> Union[TrainingPlanT, None]:
    """Sets  the training plan type + verification on arguments type

    !!! warning "Resets the training plan"
        This function has an important (and intended!) side-effect: it resets the `training_plan` attribute.
        By default, it tries to keep the same weights as the current training plan, if available.

    Args:
        training_plan_class: training plan class to be used for training.
            For experiment to be properly and fully defined `training_plan_class` needs to be a `TrainingPlanT`
            Defaults to None (no training plan class defined yet)
        keep_weights: try to keep the same weights as the current training plan

    Returns:
        `training_plan_class` that is set for experiment

    Raises:
        FedbiomedExperimentError : bad training_plan_class type
    """
    if training_plan_class is None:
        self._training_plan_class = None
    elif inspect.isclass(training_plan_class):
        # training_plan_class must be a subclass of a valid training plan
        if issubclass(training_plan_class, TRAINING_PLAN_TYPES):
            # valid class
            self._training_plan_class = training_plan_class
        else:
            # bad class
            msg = ErrorNumbers.FB410.value + f' `training_plan_class` : {training_plan_class} class'
            logger.critical(msg)
            raise FedbiomedExperimentError(msg)
    else:
        # bad type
        msg = ErrorNumbers.FB410.value + f' `training_plan_class` of type: {type(training_plan_class)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    self._update_training_plan(keep_weights)  # resets the training plan attribute

    return self._training_plan_class

training_args

training_args()

Retrieves training arguments.

Please see also [set_training_args][fedbiomed.researcher. federated_workflows.FederatedWorkflow.set_training_args]

Returns:

Type	Description
`dict`	The arguments that are going to be passed to the training plan's `training_routine` to perfom training on the node side. An example training routine: `TorchTrainingPlan.training_routine`

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def training_args(self) -> dict:
    """Retrieves training arguments.

    Please see also [`set_training_args`][fedbiomed.researcher.\
    federated_workflows.FederatedWorkflow.set_training_args]

    Returns:
        The arguments that are going to be passed to the training plan's
            `training_routine` to perfom training on the node side. An example
            training routine: [`TorchTrainingPlan.training_routine`]
            [fedbiomed.common.training_plans.TorchTrainingPlan.training_routine]
    """

    return self._training_args.dict()

training_plan

training_plan()

Retrieves the training plan instance currently being used in the federated workflow.

Returns:

Type	Description
`Optional[TrainingPlan]`	training plan: the training plan instance

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def training_plan(self) -> Optional[TrainingPlan]:
    """Retrieves the training plan instance currently being used in the federated workflow.

    Returns:
        training plan: the training plan instance
    """
    return self._training_plan

training_plan_approve

training_plan_approve(description='no description provided')

Send a training plan and a ApprovalRequest message to node(s).

This is a simple redirect to the Requests.training_plan_approve() method.

If a list of node id(s) is provided, the message will be individually sent to all nodes of the list. If the node id(s) list is None (default), the message is broadcast to all nodes.

Parameters:

Name	Type	Description	Default
`description`	`str`	Description for training plan approve request	`'no description provided'`

Returns:

Name	Type	Description
	`dict`	a dictionary of pairs (node_id: status), where status indicates to the researcher
	`dict`	that the training plan has been correctly downloaded on the node side.
`Warning`	`dict`	status does not mean that the training plan is approved, only that it has been added
	`dict`	to the "approval queue" on the node side.

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def training_plan_approve(self,
                          description: str = "no description provided") -> dict:
    """Send a training plan and a ApprovalRequest message to node(s).

    This is a simple redirect to the Requests.training_plan_approve() method.

    If a list of node id(s) is provided, the message will be individually sent
    to all nodes of the list.
    If the node id(s) list is None (default), the message is broadcast to all nodes.

    Args:
        description: Description for training plan approve request

    Returns:
        a dictionary of pairs (node_id: status), where status indicates to the researcher
        that the training plan has been correctly downloaded on the node side.
        Warning: status does not mean that the training plan is approved, only that it has been added
        to the "approval queue" on the node side.
    """
    job = TrainingPlanApproveJob(
        nodes=self.training_data().node_ids(),
        keep_files_dir=self.experimentation_path(),
        training_plan=self.training_plan(),
        description=description,
    )
    responses = job.execute()
    return responses

training_plan_class

training_plan_class()

Retrieves the type of the training plan that is created for training.

Please see also set_training_plan_class.

Returns:

Name	Type	Description
`training_plan_class`	`Optional[TrainingPlanT]`	the class type of the training plan.

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def training_plan_class(self) -> Optional[TrainingPlanT]:
    """Retrieves the type of the training plan that is created for training.

    Please see also
    [`set_training_plan_class`][fedbiomed.researcher.federated_workflows.TrainingPlanWorkflow.set_training_plan_class].

    Returns:
        training_plan_class: the class type of the training plan.
    """

    return self._training_plan_class

training_plan_file

training_plan_file(display=True)

Retrieves the path of the file where the training plan is saved, and optionally displays it.

Parameters:

Name	Type	Description	Default
`display`	`bool`	If `True`, prints the content of the training plan file. Default is `True`	`True`

Returns:

Type	Description
`str`	Path to the training plan file

Raises:

Type	Description
`FedbiomedExperimentError`	bad argument type, or cannot read training plan file content

Source code in fedbiomed/researcher/federated_workflows/_training_plan_workflow.py

@exp_exceptions
def training_plan_file(self, display: bool = True) -> str:
    """Retrieves the path of the file where the training plan is saved, and optionally displays it.

    Args:
        display: If `True`, prints the content of the training plan file. Default is `True`

    Returns:
        Path to the training plan file

    Raises:
        FedbiomedExperimentError: bad argument type, or cannot read training plan file content
    """
    if not isinstance(display, bool):
        # bad type
        msg = ErrorNumbers.FB410.value + \
            f', in method `training_plan_file` param `display` : type {type(display)}'
        logger.critical(msg)
        raise FedbiomedExperimentError(msg)

    if display and self._training_plan_file is not None:
        try:
            with open(self._training_plan_file) as file:
                content = file.read()
                file.close()
                print(content)
        except OSError as e:
            # cannot read training plan file content
            msg = ErrorNumbers.FB412.value + \
                f', in method `training_plan_file` : error when reading training plan file - {e}'
            logger.critical(msg)
            raise FedbiomedExperimentError(msg)

    return self._training_plan_file

Experiment

Classes

Experiment

Attributes

aggregator_args instance-attribute

Functions

agg_optimizer

aggregated_params

aggregator

breakpoint

commit_experiment_history

info

load_breakpoint classmethod

load_training_replies

monitor

retain_full_history

round_current

round_limit

run

run_once

save_aggregated_params staticmethod

save_optimizer

save_training_replies

set_agg_optimizer

set_aggregator

set_retain_full_history

set_round_limit

set_strategy

set_tensorboard

set_test_metric

set_test_on_global_updates

set_test_on_local_updates

set_test_ratio

set_training_data

strategy

test_metric

test_metric_args

test_on_global_updates

test_on_local_updates

test_ratio

training_replies

FederatedWorkflow

Attributes

id property

secagg property

Functions

all_federation_nodes

breakpoint

experimentation_folder

experimentation_path

filtered_federation_nodes

info

load_breakpoint classmethod

nodes

run abstractmethod

save_breakpoints

secagg_setup

set_experimentation_folder

set_nodes

set_save_breakpoints

set_secagg

set_tags

set_training_data

tags

training_data

TrainingPlanWorkflow

Functions

breakpoint

check_training_plan_status

info

load_breakpoint classmethod

model_args

set_model_args

set_training_args

set_training_plan_class

training_args

training_plan

training_plan_approve

training_plan_class

training_plan_file

aggregator_args `instance-attribute`

load_breakpoint `classmethod`

save_aggregated_params `staticmethod`

id `property`

secagg `property`

load_breakpoint `classmethod`

run `abstractmethod`

load_breakpoint `classmethod`