Fed-BioMed Component configuration
Fed-BioMed components need to be configured before using Fed-BioMed. The Fed-BioMed CLI simplifies this process by automating the setup of default components to enhance the user experience, especially for testing purposes. This article will guide you through the basic configuration steps required for the minimize initialization of Fed-BioMed components.
For more detailed configuration instructions, please refer to the Configuring Nodes and Deployment user guides.
Fed-BioMed Components
Fed-BioMed components are instances provided by Fed-BioMed with distinct responsibilities in a federated learning infrastructure. While they can operate independently and do not directly depend on each other, they are necessary to complete a federated learning infrastructure. These components may also include optional sub-components to enhance usability (e.g., the Node GUI, a user-friendly web application for managing nodes).
This guide will be focusing on Node and Researcher components to create a basic infrastructure to follow the examples and the tutorials in the documentation. To find out more about Fed-BioMed architecture and component please refer to Fed-BioMed architecture
Initializing Components
The Fed-BioMed CLI is configured to initialize components if they do not already exist. Initializing component means creating a specific folder for the component to keep its assets. While Fed-BioMed uses default folder name for each component type it is also possible to use custom paths to address or initialize components with different folder names. For the components that are not existing, the CLI asks for the permission before creating the them. While this feature allows to create component quickly, CLI also provide a specific option just for initializing/creating components explicitly.
It is important to note that, unless specified, the component directory will be generated in the directory where the command is executed. This behavior applies to both the Researcher and Node components.
-y option to create components automatically
To avoid approving component creation option -y
can be used. e.g fedbiomed node -y [ACTION]
or fedbiomed researcher -y [ACTION]
. -y
option should be specified right after component type specification.
Component directories
Component configurations, and other necessary assets to provide components to function properly are located in its own directory of the component. These directories are called component directory, and the directory name (folder name) refers to component name. While Fed-BioMed components come with default components names, it is also possible to use different folder names. This functionality is useful especially if several components are hosted in the same parent directory of the same file system.
First execution of the following command will create will create a component directory called fbm-node
(default component name) located in the directory where this command is executed, and it will ask for permission to initialize component directory if it is not existing.
fedbiomed node dataset list
Once the default component is created, Fed-BioMed CLI command execution in the same directory will use the default component named fbm-node
. However, it is also possible to indicate component directory to be able to execute Fed-BioMed command from different directories.
fedbiomed node --path some/other/directory/fbm-node dataset list
This functionality is also same for researcher component.
Creating components beforehand
Creating components beforehand using the CLI is recommended when creating multiple components before performing any actions on them. A good practice is also to create a separate directory to keep all generated components in one place. For example, a directory called fbm-components
can be created to hold all Fed-BioMed components. This will allow to access these component easily.
To create a Node component:
fedbiomed component create --component node --path fbm-components/my-node
To create another Node component:
fedbiomed component create --component node --path fbm-components/my-second-node
The Researcher component is also essential for every Fed-BioMed setup:
fedbiomed component create --component researcher --path fbm-components/my-researcher
Managing Components
The Fed-BioMed CLI is designed to manage different types of components and their sub-components. It also enables managing multiple components by specifying the path where each component is initialized. This section explains how to address distinct components and minimize errors caused by incorrect component specifications.
Managing Node Component
All the actions that are specific to Node component should be declared after the option node
of fedbiomed
command. For example, fedbiomed node dataset list
will list all the datasets deployed on the dataset. You can list all possible options and action by executing fedbiomed node --help
.
The execution of fedbiomed node
without --path
option will assume the working directory is the directory where the command is executed, and look for fbm-node
folder to chose default node instantiation. If this folder is not existing it will ask permission to create one. Therefore, it is important to double check the directory that fedbiomed
command is going to me executed.
Multi-node setup
As it is mentioned before, --path
option allows to chose distinct node initialization. This option accepts relative or absolute paths. While managing multiple components it is highly recommended to use --path
option address correct component initialization.
You can find an example of multi-node initialization;
cd my-nodes/
fedbiomed component create --component node --path ./my-node
fedbiomed component create --component node --path ./my-second-node
# List datasets in component initialized in `./my-node`
fedbiomed node --path ./my-node dataset list
or,
fedbiomed node --path ./my-second-node training-plan list
Single-node setup
When working with a single-node setup, it is recommended to use default component names to avoid the need for the --path
declaration each time a fedbiomed
command is executed.
Initializing a default node component
The following command initializes a default Node component:
fedbiomed component create -c node
After initializing a default component, the fedbiomed node
command can be executed without specifying the --path
option, unless it is run from a different directory than where fbm-node
is located.
Example:
# List datasets deployed on the default node
fedbiomed node dataset list
Initializing a default node component in a specific directory
To initialize a default Node component in a specific directory, component path has to be declared, and the folder should be named fbm-node
:
fedbiomed component create -c node -p /path/to/fbm-node
If the fedbiomed
command is executed from a directory other than /path/to/fbm-node
, the --path
option must be used to specify the correct Node component:
# Example with explicit path
fedbiomed node dataset list --path /path/to/fbm-node
More about configuring nodes
Please visit node user guide to find out more about configuring nodes and deploying dataset.
Managing researcher component
Unlike the Node component, the Researcher serves as the server in the Fed-BioMed federated learning infrastructure. There can only be one active Researcher that Nodes connect to at a time. The default component name for the Researcher is fbm-researcher
.
While it is possible to create multiple Researcher components, this is only useful when different Researcher configurations are needed to support separate federated learning setups.
Creating and Managing Researcher Components
The process of creating and selecting a Researcher component is similar to that of a Node. You can specify a particular Researcher component using the --path
option.
Example: Default component creation
fedbiomed component create -c researcher
Example: using a researcher component from a specific directory
fedbiomed researcher --path /path/to/fbm-researcher start
Choosing the Correct Component Initialization
Unlike Node components, the Researcher component runs within a Python session rather than as a standalone process. This session must be correctly configured to use the intended Researcher component. The start
command for the Researcher has been designed to ensure to use correct initialization.
Example: Default component creation and starting
# Default component creation
fedbiomed component create -c researcher
# Start a jupyter notebook for this component
fedbiomed researcher start
The commands above will create a default Researcher component (fbm-researcher
) and start a Jupyter Notebook that is pre-configured to use the created Researcher.
This behavior can also be achieved by specifying the --path
option when starting the Researcher from a different directory:
fedbiomed researcher --path /path/to/fbm-researcher start
Executing Plain Python Scripts Without fedbiomed researcher
The Researcher does not need to be started using a Jupyter Notebook. You can use a plain Python script to define and execute experiments. However, in this case, the correct Researcher initialization directory must be set at the beginning of the script.
Example: using environment variables in a script
import os
os.environ["FBM_RESEARCHER_COMPONENT_ROOT"] = '/path/to/fbm-researcher'
# Remaining code for the experiment
# ...
Example: Setting environment variables in the command Line
You can achieve the same behavior by exporting the FBM_RESEARCHER_COMPONENT_ROOT
environment variable before running your script:
export FBM_RESEARCHER_COMPONENT_ROOT=/path/to/fbm-researcher
python my-experiment.py
Alternatively, you can set the environment variable inline:
FBM_RESEARCHER_COMPONENT_ROOT=/path/to/fbm-researcher python my-experiment.py
Cleaning
A Fed-BioMed instance can handle successive operations like adding and then removing nodes or datasets, conducting sequential experiments. But after testing, thing may get wrong. At this point, you may want to clean all things. Afterwards, you will need to restart from scratch (add datasets to nodes, start nodes, etc...)
To clean your Fed-BioMed instance :
- stop the researcher : shutdown the notebook kernel (
Quit
in on the notebook interface orctrl-C
on the console) - stop the nodes : interrupt (
ctrl-C
) on the nodes console - remove all configuration files, dataset sharing configuration, temporary files, caches for all Fed-BioMed components with :
$ rm -rf COMPONENT_DIR
Where COMPONENT_DIR
is: * for a node, the parameter provided as fedbiomed node -p COMPONENT_DIR
or by default fbm-node
if no parameter was given * for a researcher, the parameter provided as fedbiomed researcher -p COMPONENT_DIR
or by default fbm-researcher
if no parameter was given
When you restart a node after cleaning the Fed-BioMed instance, the node doesn't provides any dataset, as the dataset sharing configuration was reset in the cleaning process. Of course, Fed-BioMed did not delete any data, it just stopped sharing them.