Classes that simplify imports from fedbiomed.common.dataloader
Attributes
PytorchDataLoaderItem module-attribute
PytorchDataLoaderItem = Optional[Union[Tensor, Dict[str, Tensor]]]
PytorchDataLoaderSample module-attribute
PytorchDataLoaderSample = Tuple[PytorchDataLoaderItem, PytorchDataLoaderItem]
SkLearnDataLoaderItemBatch module-attribute
SkLearnDataLoaderItemBatch = Optional[ndarray]
SkLearnDataLoaderSampleBatch module-attribute
SkLearnDataLoaderSampleBatch = Tuple[SkLearnDataLoaderItemBatch, SkLearnDataLoaderItemBatch]
Classes
DataLoader
DataLoader(dataset, *args, **kwargs)
Bases: ABC
Abstract base class for data loaders specific to a training plan's framework
Source code in fedbiomed/common/dataloader/_dataloader.py
@abstractmethod
def __init__(self, dataset: Any, *args, **kwargs) -> None:
"""Class constructor"""
PytorchDataLoader
PytorchDataLoader(dataset, *args, **kwargs)
Bases: DataLoader, DataLoader
Data loader class for PyTorch training plan
Source code in fedbiomed/common/dataloader/_dataloader.py
@abstractmethod
def __init__(self, dataset: Any, *args, **kwargs) -> None:
"""Class constructor"""
SkLearnDataLoader
SkLearnDataLoader(dataset, batch_size=1, shuffle=False, drop_last=False)
Bases: DataLoader
Data loader class for scikit-learn training plan
Assumes that fixing seed for reproducibility is handled globally in a calling class
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset | Dataset | dataset object | required |
batch_size | int | batch size for each iteration | 1 |
shuffle | bool | True if shuffling before iteration | False |
drop_last | bool | whether to drop the last batch in case it does not fill the whole batch size | False |
Raises:
| Type | Description |
|---|---|
FedbiomedError | bad argument type or value |
Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def __init__(
self,
dataset: Dataset,
batch_size: int = 1,
shuffle: bool = False,
drop_last: bool = False,
):
"""Class constructor
Args:
dataset: dataset object
batch_size: batch size for each iteration
shuffle: True if shuffling before iteration
drop_last: whether to drop the last batch in case it does not fill the whole batch size
Raises:
FedbiomedError: bad argument type or value
"""
# Note: fixing seed for reproducibility is handled globally in SkLearnDataManager
# `dataset` type was already checked in SKLearnDataManager, but not kwargs
if not isinstance(batch_size, int):
raise FedbiomedError(
f"{ErrorNumbers.FB632.value}: Bad type for `batch_size` argument, "
f"expected `int` got {type(batch_size)}"
)
if batch_size <= 0:
raise FedbiomedError(
f"{ErrorNumbers.FB632.value}. Wrong value for `batch_size` argument, "
f"expected a non-zero positive integer, got value {batch_size}."
)
if not isinstance(shuffle, bool):
raise FedbiomedError(
f"{ErrorNumbers.FB632.value}: Bad type for `shuffle` argument, "
f"expected `bool` got {type(shuffle)}"
)
if not isinstance(drop_last, bool):
raise FedbiomedError(
f"{ErrorNumbers.FB632.value}: Bad type for `drop_last` argument, "
f"expected `bool` got {type(drop_last)}"
)
self._dataset = dataset
self._batch_size = batch_size
self._shuffle = shuffle
self._drop_last = drop_last
Attributes
dataset property
dataset
Returns the encapsulated dataset
This needs to be a property to harmonize the API with torch.DataLoader, enabling us to write generic code for DataLoaders.
Functions
batch_size
batch_size()
Returns the batch size
Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def batch_size(self) -> int:
"""Returns the batch size"""
return self._batch_size
drop_last
drop_last()
Returns the boolean drop_last attribute
Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def drop_last(self) -> bool:
"""Returns the boolean drop_last attribute"""
return self._drop_last
n_remainder_samples
n_remainder_samples()
Returns the remainder of the division between dataset length and batch size.
Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def n_remainder_samples(self) -> int:
"""Returns the remainder of the division between dataset length and batch size."""
return len(self._dataset) % self._batch_size
shuffle
shuffle()
Returns the boolean shuffle attribute
Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def shuffle(self) -> bool:
"""Returns the boolean shuffle attribute"""
return self._shuffle