Module includes the classes that allow researcher to interact with remote datasets (federated datasets).
Attributes
Classes
FederatedDataSet
FederatedDataSet(data)
A class that allows researcher to interact with remote datasets (federated datasets).
It contains details about remote datasets, such as client ids, data size that can be useful for aggregating or sampling strategies on researcher's side
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data | Dict | Dictionary of datasets. Each key is a | required |
Raises:
Type | Description |
---|---|
FedbiomedFederatedDataSetError | bad |
Source code in fedbiomed/researcher/datasets.py
def __init__(self, data: Dict):
"""Construct FederatedDataSet object.
Args:
data: Dictionary of datasets. Each key is a `str` representing a node's ID. Each value is
a `dict` (or a `list` containing exactly one `dict`). Each `dict` contains the description
of the dataset associated to this node in the federated dataset.
Raises:
FedbiomedFederatedDataSetError: bad `data` format
"""
# check structure of data
self._v = Validator()
self._v.register("list_or_dict", self._dataset_type, override=True)
try:
self._v.validate(data, dict)
for node, ds in data.items():
self._v.validate(node, str)
self._v.validate(ds, "list_or_dict")
if isinstance(ds, list):
if len(ds) == 1:
self._v.validate(ds[0], dict)
# convert list of one dict to dict
data[node] = ds[0]
else:
errmess = f'{ErrorNumbers.FB416.value}: {node} must have one unique dataset ' \
f'but has {len(ds)} datasets.'
logger.error(errmess)
raise FedbiomedFederatedDataSetError(errmess)
except ValidatorError as e:
errmess = f'{ErrorNumbers.FB416.value}: bad parameter `data` must be a `dict` of ' \
f'(`list` of one) `dict`: {e}'
logger.error(errmess)
raise FedbiomedFederatedDataSetError(errmess)
self._data = data
Functions
data
data()
node_ids
node_ids()
sample_sizes
sample_sizes()
Retrieve list of sample sizes of node's dataset.
Returns:
Type | Description |
---|---|
List[int] | List of sample sizes in federated datasets in the same order with node_ids |
Source code in fedbiomed/researcher/datasets.py
def sample_sizes(self) -> List[int]:
"""Retrieve list of sample sizes of node's dataset.
Returns:
List of sample sizes in federated datasets in the same order with
[node_ids][fedbiomed.researcher.datasets.FederatedDataSet.node_ids]
"""
sample_sizes = []
for (key, val) in self._data.items():
sample_sizes.append(val["shape"][0])
return sample_sizes
shapes
shapes()
Get shape of FederatedDatasets by node ids.
Returns:
Type | Description |
---|---|
Dict[str, int] | Includes |
Source code in fedbiomed/researcher/datasets.py
def shapes(self) -> Dict[str, int]:
"""Get shape of FederatedDatasets by node ids.
Returns:
Includes [`sample_sizes`][fedbiomed.researcher.datasets.FederatedDataSet.sample_sizes] by node_ids.
"""
shapes_dict = {}
for node_id, node_data_size in zip(self.node_ids(),
self.sample_sizes()):
shapes_dict[node_id] = node_data_size
return shapes_dict