• Home
  • User Documentation
  • About
  • More
    • Funding
    • News
    • Contributors
    • Users
    • Roadmap
    • How to Cite Us
    • Contact Us
  • Home
  • User Documentation
  • About
  • More
    • Funding
    • News
    • Contributors
    • Users
    • Roadmap
    • How to Cite Us
    • Contact Us
  • Getting Started
    • What's Fed-BioMed
    • Fedbiomed Architecture
    • Fedbiomed Workflow
    • Installation
    • Basic Example
    • Configuration
  • Tutorials
    • PyTorch
      • PyTorch MNIST Basic Example
      • How to Create Your Custom PyTorch Training Plan
      • PyTorch Used Cars Dataset Example
      • Transfer-learning in Fed-BioMed tutorial
      • PyTorch aggregation methods in Fed-BioMed
    • MONAI
      • Federated 2d image classification with MONAI
      • Federated 2d XRay registration with MONAI
    • Scikit-Learn
      • MNIST classification with Scikit-Learn Classifier (Perceptron)
      • Fed-BioMed to train a federated SGD regressor model
      • Implementing other Scikit Learn models for Federated Learning
    • Optimizers
      • Advanced optimizers in Fed-BioMed
    • Analytics
      • FA Tutorial 1 — Tabular Dataset
    • FLamby
      • Introduction
      • FLamby in Fed-BioMed
    • Advanced
      • In Depth Experiment Configuration
      • PyTorch model training using a GPU
      • Breakpoints
    • Security
      • Using Differential Privacy with OPACUS on Fed-BioMed
      • Local and Central DP with Fed-BioMed: MONAI 2d image registration
      • Training Process with Training Plan Management
      • Training with Secure Aggregation
      • End-to-end Privacy Preserving Training and Inference on Medical Data
    • Biomedical data
      • Brain Segmentation
      • Multi-Channel Variational Autoencoder
  • User Guide
    • Glossary
    • Datasets
      • Introduction
      • Default Datasets
      • Image Datasets
      • Tabular Datasets
      • Medical Datasets
      • Adding your Custom Dataset
      • Adding a Native Dataset
      • Applying Transformations
      • Federated Analytics
    • Deployment
      • Introduction
      • VPN Deployment
      • Network matrix
      • Security model
    • Node
      • Configuring Nodes
      • Deploying Datasets
      • Federated Analytics
      • Training Plan Management
      • Using GPU
      • Node GUI
    • Researcher
      • Training Plan
      • Training Data
      • Experiment
      • Aggregation
      • Listing Datasets and Selecting Nodes
      • Federated Analytics
      • Model Validation on the Node Side
      • Tensorboard
    • Optimization
    • Secure Aggregation
      • Introduction
      • Configuration
      • Managing Secure Aggregation in Researcher
  • Developer
    • API Reference
      • Common
        • Analytics
        • Certificate Manager
        • CLI
        • Config
        • Constants
        • DataLoader
        • DataLoadingPlan
        • DataManager
        • Dataset Controller
        • Dataset Reader
        • Dataset
        • DB
        • Exceptions
        • IPython
        • Json
        • Logger
        • Message
        • Metrics
        • Model
        • Optimizers
        • Privacy
        • Secagg
        • Secagg Manager
        • Serializer
        • Singleton
        • Synchro
        • TasksQueue
        • TrainingPlans
        • TrainingArgs
        • Utils
        • Validator
      • Node
        • CLI
        • CLI Utils
        • Config
        • DatasetManager
        • HistoryMonitor
        • Jobs
        • Node
        • NodeStateManager
        • Requests
        • Round
        • Secagg
        • Secagg Manager
        • TrainingPlanSecurityManager
      • Researcher
        • Aggregators
        • CLI
        • Config
        • Datasets
        • Federated Workflows
        • Filetools
        • Jobs
        • Monitor
        • NodeStateAgent
        • Requests
        • Secagg
        • Strategies
      • Transport
        • Client
        • Controller
        • NodeAgent
        • Server
    • Usage and Tools
    • Continuous Integration
    • Definition of Done
    • Development Environment
    • Testing in Fed-BioMed
    • RPC Protocol and Messages
    • Federated Analytics
    • Security Logging
  • FAQ & Troubleshooting
Download Notebook

FA Tutorial 1 — Tabular Dataset¶

Federated Analytics (FA) lets researchers compute statistics across distributed datasets without any raw data leaving the nodes. Each node computes partial statistics locally; the researcher aggregates them globally.

This notebook covers FA on tabular (CSV) datasets. Companion notebooks cover image and medical folder datasets.

What you will learn¶

  • How to run one-liner FA convenience methods (mean, variance, etc.)
  • How to request multiple statistics in a single round-trip
  • How to filter columns with dataset_schema
  • How to inspect and aggregate results using the FAResult API
  • How caching avoids redundant network round-trips

Before You Start¶

Configure nodes¶

# Node 1
fedbiomed node -p my-node-1 dataset add    # add a CSV dataset with tag 'tabular'
fedbiomed node -p my-node-1 start

# Node 2 (optional)
fedbiomed node -p my-node-2 dataset add
fedbiomed node -p my-node-2 start

Wait until you see Starting task manager in each node terminal.

Start the researcher¶

fedbiomed researcher start

Note on FA permissions: FA is enabled by default. A node administrator can disable it by setting allow_federated_analytics = False in the node config under [security].

Create an Experiment¶

Experiment discovers all nodes that have a dataset with the given tag and sets up the analytics property automatically.

In [ ]:
Copied!
from fedbiomed.researcher.federated_workflows import Experiment

tags = ['tabular']
exp = Experiment(tags=tags)
from fedbiomed.researcher.federated_workflows import Experiment tags = ['tabular'] exp = Experiment(tags=tags)
2026-03-11 11:37:16,262 fedbiomed DEBUG -  adding handler for: SECURITY_FILE
2026-03-11 11:37:16,349 fedbiomed INFO - Starting researcher service...
2026-03-11 11:37:16,350 fedbiomed INFO - Waiting 3s for nodes to connect...
2026-03-11 11:37:19,354 fedbiomed INFO - Updating training data. This action will update FederatedDataset, and the nodes that will participate to the experiment.
2026-03-11 11:37:19,369 fedbiomed INFO - Node selected for training -> Default Node Alias
Node ID is -> NODE_e1d11980-12dd-4fde-a356-6554d68c593d
2026-03-11 11:37:19,369 fedbiomed INFO - Node selected for training -> Default Node Alias
Node ID is -> NODE_d0e82145-2311-46cb-93f8-922f36f4b71d

Inspect the datasets discovered on nodes. Each entry shows the dataset metadata (columns, types, tags) registered by the node administrator.

In [ ]:
Copied!
exp.training_data().data()
exp.training_data().data()
Out[ ]:
{'NODE_e1d11980-12dd-4fde-a356-6554d68c593d': {'name': 'csv',
  'data_type': 'csv',
  'tags': ['tabular'],
  'description': 'tabular dataset',
  'shape': {'csv': [10668, 7]},
  'dtypes': {'year': 'Int64',
   'price': 'Int64',
   'transmission': 'Int64',
   'mileage': 'Int64',
   'tax': 'Int64',
   'mpg': 'Float64',
   'engineSize': 'Float64'},
  'dataset_id': 'dataset_0ca1dd3b-6bb5-41a4-86ef-2f6b3a905ff3',
  'dataset_parameters': {}},
 'NODE_d0e82145-2311-46cb-93f8-922f36f4b71d': {'name': 'cars',
  'data_type': 'csv',
  'tags': ['tabular'],
  'description': 'toy-dataaset',
  'shape': {'csv': [17965, 7]},
  'dtypes': {'year': 'Int64',
   'price': 'Int64',
   'transmission': 'Int64',
   'mileage': 'Int64',
   'tax': 'Int64',
   'mpg': 'Float64',
   'engineSize': 'Float64'},
  'dataset_id': 'dataset_15f67e62-e534-49e0-a080-d42849878e3d',
  'dataset_parameters': {}}}

Convenience Methods¶

The analytics property exposes one-liner methods for the five most common statistics. Each method:

  • Sends the request to all nodes (only on the first call; subsequent calls use the cache).
  • Receives per-node partial results.
  • Returns the globally aggregated value directly.
In [ ]:
Copied!
# Compute the global mean across all columns and all nodes
exp.analytics.mean()
# Compute the global mean across all columns and all nodes exp.analytics.mean()
Out[ ]:
{'year': 2016.9620209933992,
 'price': 16235.200740670456,
 'transmission': 1.0212691821941424,
 'mileage': 23908.94412840134,
 'tax': 118.05766210348528,
 'mpg': 55.24739984969232,
 'engineSize': 1.5668631812010942}
In [ ]:
Copied!
# Other convenience methods
print('Variance:', exp.analytics.variance())
# results can come from cache (no new network round-trip)
print('Count:   ', exp.analytics.count())
# Other convenience methods print('Variance:', exp.analytics.variance()) # results can come from cache (no new network round-trip) print('Count: ', exp.analytics.count())
Variance: {'year': 4.400819618015132, 'price': 91583707.107012, 'transmission': 0.305044955392466, 'mileage': 444227213.02946347, 'tax': 4131.056130965234, 'mpg': 138.7155247721354, 'engineSize': 0.33134486956551856}
Count:    {'year': 28632, 'price': 28633, 'transmission': 28633, 'mileage': 28633, 'tax': 28633, 'mpg': 28633, 'engineSize': 28633}

Exploring the FAResult Object¶

fetch_stats gives full control and returns a raw FAResult — useful when you want to inspect per-node data, check what statistics are available, or retrieve multiple stats together.

In [ ]:
Copied!
result = exp.analytics.fetch_stats('mean')

# Which nodes replied?
print('Node IDs:', result.node_ids)
result = exp.analytics.fetch_stats('mean') # Which nodes replied? print('Node IDs:', result.node_ids)
Node IDs: ['NODE_e1d11980-12dd-4fde-a356-6554d68c593d', 'NODE_d0e82145-2311-46cb-93f8-922f36f4b71d']
In [ ]:
Copied!
# Schema mirrors the structure of node outputs, with stat-leaf positions shown as {}
result.schema
# Schema mirrors the structure of node outputs, with stat-leaf positions shown as {} result.schema
Out[ ]:
{'year': {},
 'price': {},
 'transmission': {},
 'mileage': {},
 'tax': {},
 'mpg': {},
 'engineSize': {}}
In [ ]:
Copied!
# Which stat keys are stored at leaves (across all nodes)?
print('Available stats:', result.available_stats())

# Which can be aggregated globally?
print('Computable stats:', result.computable_stats())
# Which stat keys are stored at leaves (across all nodes)? print('Available stats:', result.available_stats()) # Which can be aggregated globally? print('Computable stats:', result.computable_stats())
Available stats: ['count', 'mean', 'variance']
Computable stats: ['count', 'mean', 'std', 'sum', 'variance']
In [ ]:
Copied!
# Raw per-node outputs — useful for debugging or site-level comparisons
result.node_stats()
# Raw per-node outputs — useful for debugging or site-level comparisons result.node_stats()
Out[ ]:
{'NODE_e1d11980-12dd-4fde-a356-6554d68c593d': {'year': {'mean': 2017.100684353619,
   'count': 10667,
   'variance': 4.6984687},
  'price': {'mean': 22896.685039370048,
   'count': 10668,
   'variance': 137237520.0},
  'transmission': {'mean': 1.0827709036370428,
   'count': 10668,
   'variance': 0.58366114},
  'mileage': {'mean': 24827.244000749928,
   'count': 10668,
   'variance': 552497100.0},
  'tax': {'mean': 126.0114360704911, 'count': 10668, 'variance': 4511.848},
  'mpg': {'mean': 50.770022497330956, 'count': 10668, 'variance': 167.69684},
  'engineSize': {'mean': 1.930708661415088,
   'count': 10668,
   'variance': 0.36355674}},
 'NODE_d0e82145-2311-46cb-93f8-922f36f4b71d': {'year': {'mean': 2016.8665738936904,
   'count': 17965,
   'variance': 4.2039175},
  'price': {'mean': 12279.756415251926,
   'count': 17965,
   'variance': 22480710.0},
  'transmission': {'mean': 0.9847481213470743,
   'count': 17965,
   'variance': 0.13603991},
  'mileage': {'mean': 23363.630503757344,
   'count': 17965,
   'variance': 379163260.0},
  'tax': {'mean': 113.33453938213202, 'count': 17965, 'variance': 3845.2944},
  'mpg': {'mean': 57.90699137215473, 'count': 17965, 'variance': 102.535416},
  'engineSize': {'mean': 1.3508266072919597,
   'count': 17965,
   'variance': 0.186945}}}
In [ ]:
Copied!
# Globally aggregated value for a single stat — same structure as node output
result.global_stats('mean')
# Globally aggregated value for a single stat — same structure as node output result.global_stats('mean')
Out[ ]:
{'year': 2016.9537929589344,
 'price': 16235.38085425909,
 'transmission': 1.0212691649495393,
 'mileage': 23908.93937065627,
 'tax': 118.05766074110295,
 'mpg': 55.2479202319801,
 'engineSize': 1.5668773792468906}

Filtering Columns with dataset_schema¶

By default FA runs over all columns. Pass a list of column names as dataset_schema to restrict the computation.

In [ ]:
Copied!
# Replace 'age' and 'bmi' with actual column names from your dataset
result = exp.analytics.fetch_stats(stats='mean', dataset_schema=['year', 'price'])
result.global_stats('mean')
# Replace 'age' and 'bmi' with actual column names from your dataset result = exp.analytics.fetch_stats(stats='mean', dataset_schema=['year', 'price']) result.global_stats('mean')
Out[ ]:
{'year': 2016.9620209933992, 'price': 16235.200740670456}

Caching¶

Fed-BioMed caches the last FAResult per experiment. Calling fetch_stats again with the same arguments returns the cached result instantly — no network round-trip occurs.

The cache is invalidated when a different statistic, dataset_schema, or stats_args is requested, or after a new training round.

In [ ]:
Copied!
# First call — contacts the nodes
result1 = exp.analytics.fetch_stats('mean')

# Second call — served from cache (instant)
result2 = exp.analytics.fetch_stats('mean')
print('Cached:', result1 is result2)   # True
# First call — contacts the nodes result1 = exp.analytics.fetch_stats('mean') # Second call — served from cache (instant) result2 = exp.analytics.fetch_stats('mean') print('Cached:', result1 is result2) # True
Cached: True
In [ ]:
Copied!

Download Notebook
  • What you will learn
  • Before You Start
    • Configure nodes
Address:

2004 Rte des Lucioles, 06902 Sophia Antipolis

E-mail:

fedbiomed _at_ inria _dot_ fr

Fed-BioMed © 2022