How it Works

Federated Discovery

What is Federated Discovery?

Federated discovery enables users to explore what data is available across multiple institutions without accessing or moving the raw data itself. Instead of centralizing sensitive data, each institution exposes structured metadata about its local datasets, helping identify available cohorts, modalities, and resources. Discovery queries are dispatched across the federation while the underlying data stays locally protected. This makes federated discovery the essential first step before any analytics or learning experiment — letting users assess the feasibility of a study, identify participating institutions, and configure experiments, all without compromising patient privacy.

Learn more ➤

How Discovery Works

A typical discovery workflow in Fed-BioMed follows four steps, from dataset registration to node selection.

Dataset Registration

A data administrator deploys a dataset and assigns it one or more tags. The dataset metadata is registered locally and becomes discoverable. Raw data files never leave the institution.

Exploratory Listing

A researcher requests a broad inventory of all datasets available across participating institutions (names, types, shapes, and tags). No raw data is ever included.

Tag-Based Search

When initializing an experiment, the researcher specifies one or more tags. Fed-BioMed queries each participating institution and returns metadata from matching datasets only.

Institution Selection

The researcher can restrict the search to specific institutions. Those that do not match the tags or are not in the allowed list are silently excluded.

Privacy in Discovery

The discovery process is designed so that no sensitive information ever leaves the institutional boundary.

Only metadata is exchanged

Responses contain dataset name, type, shape, description, and tags. Never patient records, images, or raw values.

Queries are pull-based

Institutions respond only to explicit requests arriving through the authenticated federated channel. No information is ever broadcast unsolicited.

Participation is controlled

Data administrators explicitly decide which datasets to register and which tags to assign.

Discovery is decoupled from access

Knowing a dataset exists does not grant access to it. Analytics and training require separate authorization, managed independently.