Module includes the classes that allow researcher to interact with remote datasets (federated datasets).
Attributes
FederatedDataSet module-attribute
FederatedDataSet = FederatedDataset
Classes
FederatedDataset
FederatedDataset(data=None)
A class that allows researcher to interact with remote datasets (federated datasets).
It contains details about remote datasets, such as client ids, data size that can be useful for aggregating or sampling strategies on researcher's side
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | Optional[Dict] | Dictionary of datasets. Each key is a | None |
Source code in fedbiomed/researcher/datasets.py
def __init__(self, data: Optional[Dict] = None):
"""Construct FederatedDataset object.
Args:
data: Dictionary of datasets. Each key is a `str` representing a node's ID. Each value is
a `dict` (or a `list` containing exactly one `dict`). Each `dict` contains the description
of the dataset associated to this node in the federated dataset.
"""
# check structure of data
if data is not None:
self.set_federated_dataset(data)
else:
self._data = {}
Functions
data
data()
node_ids
node_ids()
sample_sizes
sample_sizes()
Retrieve list of sample sizes of node's dataset.
Returns:
| Type | Description |
|---|---|
List[int] | List of sample sizes in federated datasets in the same order with node_ids |
Source code in fedbiomed/researcher/datasets.py
def sample_sizes(self) -> List[int]:
"""Retrieve list of sample sizes of node's dataset.
Returns:
List of sample sizes in federated datasets in the same order with
[node_ids][fedbiomed.researcher.datasets.FederatedDataset.node_ids]
"""
sample_sizes = []
for _, val in self._data.items():
sample_sizes.append(val["shape"][0])
return sample_sizes
set_federated_dataset
set_federated_dataset(datasets)
Set federated dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasets | Dict | Dictionary of datasets. Each key is a | required |
Raises:
| Type | Description |
|---|---|
FedbiomedError | bad |
Source code in fedbiomed/researcher/datasets.py
def set_federated_dataset(self, datasets: Dict) -> None:
"""Set federated dataset.
Args:
datasets: Dictionary of datasets. Each key is a `str` representing a node's ID. Each value is
a `dict` (or a `list` containing exactly one `dict`). Each `dict` contains the description
of the dataset associated to this node in the federated dataset.
Raises:
FedbiomedError: bad `data` format
"""
# check structure of data
# DEPRECATED: to be removed in future versions
if isinstance(datasets, FederatedDataset):
logger.warning(
"DEPRECATED: Passing a `FederatedDataset` instance"
" to the `data` parameter of `FederatedDataset` is deprecated and "
"will not be supported in future versions. Please pass a `dict` "
"representing the federated dataset instead."
)
datasets = copy.deepcopy(datasets.data())
if isinstance(datasets, dict) is False:
raise FedbiomedError(
f"{ErrorNumbers.FB416.value}: bad parameter `data` must be a `dict` of "
f"(`list` of one) `dict`."
)
for node_id, node_data in datasets.items():
if not (isinstance(node_data, dict) or isinstance(node_data, list)):
raise FedbiomedError(
f"{ErrorNumbers.FB416.value}: bad parameter `data` for node {node_id}. "
f"Must be a `dict` or a `list` containing exactly one `dict`."
)
if isinstance(node_data, list):
if len(node_data) != 1 or not isinstance(node_data[0], dict):
raise FedbiomedError(
f"{ErrorNumbers.FB416.value}: bad parameter `data` for node {node_id}. "
f"Must be a `dict` or a `list` containing exactly one `dict`."
)
else:
datasets[node_id] = node_data[0]
self._data = datasets
shapes
shapes()
Get shape of FederatedDatasets by node ids.
Returns:
| Type | Description |
|---|---|
Dict[str, int] | Includes |
Source code in fedbiomed/researcher/datasets.py
def shapes(self) -> Dict[str, int]:
"""Get shape of FederatedDatasets by node ids.
Returns:
Includes [`sample_sizes`][fedbiomed.researcher.datasets.FederatedDataset.sample_sizes] by node_ids.
"""
shapes_dict = {}
for node_id, node_data_size in zip(
self.node_ids(), self.sample_sizes(), strict=False
):
shapes_dict[node_id] = node_data_size
return shapes_dict