Fed-BioMed deployment on multiple machines with VPN/containers

Most real-life deployments require protecting node data. Deployment using VPN/containers contributes to this goal by providing isolation of the Fed-BioMed instance from third parties. All communications between the components of a Fed-BioMed instance occur inside WireGuard VPN tunnels with mutual authentication of the VPN endpoints. Using containers can also ease installation on multiple sites.

This tutorial details a deployment scenario where:

Fed-BioMed vpnserver and researcher components run on the same machine ("the server") in the following docker containers
- vpnserver / fedbiomed/vpn-vpnserver: WireGuard server
- researcher / fedbiomed/vpn-researcher: a researcher jupyter notebooks
several Fed-BioMed node components run, one node per machine with the following containers
- node / fedbiomed/vpn-node: a node component
- gui / fedbiomed/vpn-gui: a GUI for managing node component data (optional)
all communications between the components are tunneled through a VPN

Requirements

Important

Fed-BioMed docker images aren't distributed in any platform, and it requires to be rebuilt from source code. This article will guide you building docker images and containers from Fed-BioMed source.

Supported operating systems and software requirements

Supported operating systems for containers/VPN deployment include Fedora 38, Ubuntu 22.04 LTS. Should also work for most recent Linux, MacOS X 12.6.6, 13 and 14, Windows 11 with WSL2 using Ubuntu-22.04 distribution. Also requires docker and docker compose >= 2.0.

Check here for guidelines on docker and docker-compose installation

Check here for detailed requirements.

Account privileges

Components deployment requires an account which can use docker (typically belonging to the docker group). Using a dedicated service account is a good practice. No access to the administrative account is needed, usage of root account for deploying components is discouraged to follow the principle of privilege minimization.

Web proxy

On sites where web access uses a proxy you need to configure web proxy for docker.

User or group ID for containers

By default, Fed-BioMed uses the current account's user and group ID for building and running containers.

Avoid using low ID for user or group ( < 500 for MacOS, < 1000 for Linux ) inside containers. They often conflict with pre-existing user or group account in the container images. This results in unhandled failures when setting up or starting the containers. Check your account id with id -a.

Use the CONTAINER_USER, CONTAINER_UID, CONTAINER_GROUP and CONTAINER_GID variables to use alternate values, eg for MacOS:

MacOS commonly uses group staff:20 for user accounts, which conflicts with Fed-BioMed VPN/containers mode. So a good configuration choice for MacOS can be:

export CONTAINER_GROUP=fedbiomed
export CONTAINER_GID=1111

More options for containers/VPN deployment are not covered in this tutorial but can be found here including:

using GPU in node container
building containers (eg: node and gui) on one machine, using this pre-built containers on the nodes
using different identity (account) for building and launching a container
deploying vpnserver and researcher on distinct machines

Notations

In this tutorial we use the following notations:

[user@server $] means the command is launched on the server machine (outside containers)
[user@node $] means the command is launched on a node machine (outside containers)
for commands typed inside containers, [root@vpnserver-container #] means the command is launched inside the vpnserver container as root, [user@node-container $] means the command is launched inside the vpnserver container with same user account as outside the container

Deploy on the server side

This part of the tutorial is executed once on the server side, before deploying the nodes. It covers the initial server deployment, including build, configuration and launch of containers.

download Fed-BioMed software by doing a local clone of the git repository:
```
[user@server $] git clone -b master https://github.com/fedbiomed/fedbiomed.git
[user@server $] cd fedbiomed
[user@server $] export FEDBIOMED_DIR=$PWD # use setenv for *csh
[user@server $] cd envs/vpn/docker
```
For the rest of this tutorial ${FEDBIOMED_DIR} represents the base directory of the clone.

docker compose commands need to be launched from ${FEDBIOMED_DIR}/envs/vpn/docker directory.
optionally choose a unique ID for this instance of Fed-BioMed. This is useful only when multiples instances of Fed-BioMed exist on the same machine. It adds another layer of security by using distinct docker networks for each instance running on the machine. It uses distinct container names for each instance, but does not support yet same containers from different instances running at the same time.
```
# example: choose ID manually
[user@server $] export FBM_CONTAINER_INSTANCE_ID=*my-instance-tag*
# example: generate ID from docker file directory
[user@server $] export FBM_CONTAINER_INSTANCE_ID=$(realpath $(pwd)|cksum|cut -d ' ' -f1)
```

clean running containers, containers files, temporary files

[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn clean

optionally clean the container images to force build fresh new images

[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn clean image

build server-side containers

[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn build vpnserver researcher

configure the VPN keys for containers running on the server side, after starting the vpnserver container
```
[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn configure researcher
```

start other server side containers

[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn start researcher

check all containers are running as expected on the server side

[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn status vpnserver researcher

Server side containers should be up and able to ping the VPN server

** Checking docker VPN images & VPN access: vpnserver researcher
- container vpnserver is running
- container researcher is running
- pinging VPN server from container vpnserver -> OK
- pinging VPN server from container researcher -> OK

Server side containers are now ready for node side deployment.

Deploy on the node side

This part of the tutorial is executed once on each node, after deploying the server. It covers the initial deployment, including build, configuration and launch of containers.

Some commands are executed on the node side, while some commands are executed on the server side (pay attention to the prompt).

For each node, choose a unique node tag (eg: NODETAG in this example) that represents this specific node instance for server side management commands.

download Fed-BioMed software by doing a local clone of the git repository:
```
[user@node $] git clone -b master https://github.com/fedbiomed/fedbiomed.git
[user@node $] cd fedbiomed
[user@node $] export FEDBIOMED_DIR=$PWD # use setenv for *csh
[user@node $] cd envs/vpn/docker
```
For the rest of this tutorial ${FEDBIOMED_DIR} represents the base directory of the clone.

docker compose commands need to be launched from ${FEDBIOMED_DIR}/envs/vpn/docker directory.
optionally choose a unique ID for this instance of Fed-BioMed. This is useful only when multiples instances of Fed-BioMed exist on the same machine. It adds another layer of security by using distinct docker networks for each instance running on the machine. It uses distinct container names for each instance, but does not support yet same containers from different instances running at the same time.
```
# example: choose ID manually
[user@node $] export FBM_CONTAINER_INSTANCE_ID=*my-instance-tag*
# example: generate ID from docker file directory
[user@node $] export FBM_CONTAINER_INSTANCE_ID=$(realpath $(pwd)|cksum|cut -d ' ' -f1)
```
clean running containers, containers files, temporary files (skip that step if node and server run on the same machine)
```
[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn clean
```

optionally clean the container images to force build fresh new images

[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn clean image

build node-side containers

[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn build node gui

on the server side, generate a configuration for this node (known as NODETAG)

[user@server $] cd ${FEDBIOMED_DIR}/envs/vpn/docker
[user@server $] docker compose exec vpnserver bash -ci 'python ./vpn/bin/configure_peer.py genconf node NODETAG'

The configuration file is now available on the server side in path ${FEDBIOMED_DIR}/envs/vpn/docker/vpnserver/run_mounts/config/config_peers/node/NODETAG/config.env or with command :

[user@server $] docker compose exec vpnserver cat /config/config_peers/node/NODETAG/config.env

copy the configuration file from the server side to the node side via a secure channel, to path /tmp/config.env on the node.

In most real life deployments, one shouldn't have access to both server side and node side. Secure channel in an out-of-band secured exchange (outside of Fed-BioMed scope) between the server administrator and the node administrator that provides mutual authentication of the parties, integrity and privacy of the exchanged file.

In a test deployment, one may be connected both on server side and node side. In this case, you just need to cut-paste or copy the file to the node.

Use the node's copy of the configuration file:
```
[user@node $] cp /tmp/config.env ./node/run_mounts/config/config.env
```
optionally force the use of secure aggregation by the node (node will refuse to train without the use of secure aggregation):
```
[user@node $] export FBM_SECURITY_FORCE_SECURE_AGGREGATION=True
```
optionally allow all training plans to execute without node side approval (warning: be sure this is coherent with your site security requirements !), or allow the pre-defined training plans to execute without approval:
```
[user@node $] export FBM_SECURITY_ALLOW_DEFAULT_TRAINING_PLANS=True
[user@node $] export FBM_SECURITY_TRAINING_PLAN_APPROVAL=False
```

start node container

[user@node $] docker compose up -d node

retrieve the node's publickey

[user@node $] docker compose exec node wg show wg0 public-key | tr -d '\r' >/tmp/publickey-nodeside

copy the public key from the node side to the server side via a secure channel (see above), to path /tmp/publickey-serverside on the server.

on the server side finalize configuration of the VPN keys for this node (known as NODETAG)

[user@server $] cd ${FEDBIOMED_DIR}/envs/vpn/docker
[user@server $] docker compose exec vpnserver bash -ci "python ./vpn/bin/configure_peer.py add node NODETAG $(cat /tmp/publickey-serverside)"

check containers running on the node side

[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn status node

node container should be up and able to ping the VPN server

** Checking docker VPN images & VPN access: node
- container node is running
- pinging VPN server from container node -> OK

node container is now ready to be used.

Optionally launch the node GUI :

optionally authorize connection to node GUI from distant machines. By default, only connection from local machine (localhost) is authorized.
```
[user@node $] export GUI_SERVER_IP=0.0.0.0
```
To authorize distant connection to only one of the node machine's IP addresses use a command of the form export GUI_SERVER_IP=a.b.c.d where a.b.c.d is one of the IP addresses of the node machine.

For security reasons, when authorizing connection from distant machines, it is strongly recommended to use a custom SSL certificate signed by a well-known authority.
Custom SSL certificates for GUI

GUI will start serving on port 8443 with self-signed certificates. These certificates will be identified as risky by the browsers, and users will have to approve them. However, it is also possible to set custom trusted SSL certificates by adding crt and key files to the ${FEDBIOMED_DIR}/envs/vpn/docker/gui/run_mounts/certs directory before starting the GUI.

When adding these files, please ensure that:
- the certificate extension is .crt and the key file extension is .key
- there is no more than one file for each certificate and key
optionally restrict the HTTP host names that can be used to connect to the node GUI. By default all the host names (DNS CNAME) of the node machine can be used.

For example, if the node machine has two host names my.fqdn.com and other.alias.org, use syntax like export GUI_SERVER_NAME=my.fqdn.com or GUI_SERVER_NAME='*.fqdn.com' (don't forget the enclosing single quotes) to authorize only requests using the first name (eg: https://my.fqdn.com) to reach the node GUI. Use the syntax of Nginx server_name.
start gui container
```
[user@node $] docker compose up -d gui
```

check containers running on the node side

[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn status node gui

Node side containers should be up and able to ping the VPN server

** Checking docker VPN images & VPN access: node gui
- container node is running
- container gui is running
- pinging VPN server from container node -> OK

node and gui containers are now ready to be used.

Optionally deploy a second node instance on the same node

Optionally deploy a second node instance on the same node (useful for testing purpose, not a normal deployment scenario):

deploy second node on the same machine

This part of the tutorial is optionally executed on some nodes, after deploying the server.

Use the node

This part is executed at least once on each node, after deploying the node side containers.

Setup the node by sharing datasets. The commands below will use default Fed-BioMed node directory which is located /fbm-node inside the container.

if node GUI is launched, it can be used to share datasets. On the node side machine, connect to https://localhost:8443 (or https://<host_name_and_domain>:8443 if connection from distant machine is authorized)
view the node component logs

[user@node $] docker compose logs node

optionally connect to the node container and launch commands, instead of using the GUI, for example :

connect to the container

[user@node $] cd ${FEDBIOMED_DIR}/envs/vpn/docker
[user@node $] docker compose exec -u $(id -u) node bash

re-start the Fed-BioMed node, for example in background:

[user@node-container $] kill $(ps auxwwww | grep -E 'python.*fedbiomed node start' | grep -Ev grep | awk '{ print $2}')
[user@node-container $] nohup fedbiomed node start $(cat /fbm-node/FBM_NODE_START_OPTIONS) >/fbm-node/fedbiomed_node.out &

Please note that in that case, the node component output does not appear anymore in docker compose logs node

share one or more datasets, for example a MNIST dataset or an interactively defined dataset (can also be done via the GUI):

[user@node-container $] fedbiomed node dataset add -m /fbm-node/data
[user@node-container $] fedbiomed node dataset add

Example of a few more possible commands:

optionally list shared datasets:

[user@node-container $] fedbiomed node dataset list

optionally register a new authorized training plan previously copied on the node side in ${FEDBIOMED_DIR}/envs/vpn/docker/node/run_mounts/fbm-node/data/my_training_plan.txt
```
[user@node-container $] fedbiomed node training-plan register
```
Indicate /fbm-node/data/my_training_plan.txt as path of the training plan file.

Optionally use a second node instance on the same node

This optional part is executed at least once on the nodes where a second node instance is deployed, after deploying the second node side containers:

use second node on the same machine

Use the server

This part is executed at least once on the server after setting up the nodes:

on the server side machine, connect to http://localhost:8888, then choose and run a Jupyter notebook
- make more notebooks available from the server side machine (eg: /tmp/my_notebook.ipynb) by copying them to the samples directory
```
[user@server $] cp /tmp/my_notebook.ipynb ${FEDBIOMED_DIR}/envs/vpn/docker/researcher/run_mounts/samples/
```
  The notebook is now available in the Jupyter GUI under the samples subdirectory of the Jupyter notebook interface.
if the notebook uses Tensorboard, it can be viewed
- either embedded inside the Jupyter notebook as explained in the Tensorboard documentation
- or by connecting to http://localhost:6006

Optionally use the researcher container's command line instead of the Jupyter notebooks:

connect to the researcher container

[user@server $] cd ${FEDBIOMED_DIR}/envs/vpn/docker
[user@server $] docker compose exec -u $(id -u) researcher bash

launch a script, for example a training:

[user@server-container $] jupyter nbconvert /fbm-researcher/notebooks/101_getting-started.ipynb --output=101_getting-started --to script
[user@server-container $] python /fbm-researcher/notebooks/101_getting-started.py

Note: the .py script can be created from the notebooks by a command such as jupyter nbconvert --output=101_getting-started --to script ./notebooks/101_getting-started.ipynb.

Misc server management commands

Some possible management commands after initial deployment include:

check all containers running on the server side

[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn status vpnserver researcher

check the VPN peers known from the VPN server

[user@server $] ( cd ${FEDBIOMED_DIR}/envs/vpn/docker ; docker compose exec vpnserver bash -ci "python ./vpn/bin/configure_peer.py list" )
type        id           prefix         peers
----------  -----------  -------------  ------------------------------------------------
researcher  researcher1  10.222.0.2/32  ['1exampleofdummykeyVo+lj/ZfT/wYv+I9ddWYzohC0=']
node        NODETAG      10.221.0.2/32  ['1exampleofdummykey/Z1SKEzjsMkSe1qztF0uXglnA=']

VPN configurations and container files are kept unchanged when restarting containers.

clean running containers, container files and temporary files on the server side. Requires to stop containers before.
```
[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn stop vpnserver researcher
[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn clean
```
Warning: all VPN configurations, researcher configuration files,experiment files and results, etc. are deleted when cleaning.

To clean also the container images:
```
[user@server $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn clean image
```

Misc node management commands

Some possible management commands after initial deployment include:

check all containers running on the node side

[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn status node gui

restart all containers running on the node side

[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn stop node gui
[user@node $] ( cd ${FEDBIOMED_DIR}/envs/vpn/docker ; docker compose up -d node gui )

VPN configurations and container files are kept unchanged when restarting containers.

clean running containers, container files and temporary files on the node side. Requires to stop containers before.
```
[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn stop node gui
[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn clean
```
Warning: all VPN configurations, node configuration files, node dataset sharing, etc. are deleted when cleaning.

To clean also the container images:
```
[user@node $] ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn clean image
```

Misc use alternate container image version

When building or starting a component using fedbiomed_vpn or docker compose, container image version automatically matches the software version of the sources in the current clone.

To use a different container image version either:

set the FBM_CONTAINER_VERSION_TAG environment variable (temporary change) or
set the FBM_CONTAINER_VERSION_TAGvariable in the $FEDBIOMED_DIR/envs/vpn/docker/.env file (permanent change)

Important

Using alternate container image version is advanced functionality. It may break the containers configurations of your current clone. Use them only if you know what you are doing.

Examples:

clean all containers and images for version my_version_tag, plus containers files and temporary files in current clone of the sources
```
[user@server $] FBM_CONTAINER_VERSION_TAG=my_version_tag ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn clean image
```
build researcher container from the current clone (thus using the sources of current clone's version), then tag the image with version my_version_tag
```
[user@server $] FBM_CONTAINER_VERSION_TAG=my_version_tag ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn build researcher
```
start researcher container from the current clone (thus using a file tree fitting current clone's version), using an image with version my_version_tag
```
[user@server $] FBM_CONTAINER_VERSION_TAG=my_version_tag ${FEDBIOMED_DIR}/scripts/fedbiomed_vpn start researcher
```

Troubleshooting

Proxy

On a site where access to an Internet web site requires using a proxy, configure web proxy for docker client in ~/.docker/config.json.

Prefix used by Fed-BioMed's communication inside the VPN (10.220.0.0/14) shall not be proxied. So your proxy configuration may look like (replace mysiteproxy.domain with your site proxy):

{
 "proxies":
 {
   "default":
   {
     "httpProxy": "http://mysiteproxy.domain:3128",
     "httpsProxy": "http://mysiteproxy.domain:3128",
     "noProxy": "10.220.0.0/14"
   }
 }
}

Image build on Mac M1/M2/M3 ARM

Build may fail on Mac M1/M2/M3 processors for fedbiomed/vpn-base or fedbiomed/vpn-basenode due to arm64 system.

By default, docker can use the images that are created for arm64/aarch64. But docker build files of the Fed-BioMed images and the libraries that are installed within are compatible with amd64 type of platforms. Therefore, you may get some error while fedbiomed_vpn script builds images. Those errors can be during secure aggregation setup or while installing some of required pip dependencies for Fed-BioMed environment. You can fix this problem by setting environment variable that declares default default docker platform to force docker to use linux/amd64.

export DOCKER_DEFAULT_PLATFORM=linux/amd64

After setting this variable you can execute build command through {FEDBIOMED_DIR}/scripts/fedbiomed_vpn build

Not enough space on your machine due to recursive build operations

If you have done too many builds, it may fill up the disk space reserved for Docker containers. In that case, please clear your Docker build cache to free up some space.

docker builder prune -a