Dataloader

Classes that simplify imports from fedbiomed.common.dataloader

Attributes

PytorchDataLoaderItem module-attribute

PytorchDataLoaderItem = Optional[Union[Tensor, Dict[str, Tensor]]]

PytorchDataLoaderSample module-attribute

PytorchDataLoaderSample = Tuple[PytorchDataLoaderItem, PytorchDataLoaderItem]

SkLearnDataLoaderItemBatch module-attribute

SkLearnDataLoaderItemBatch = Optional[ndarray]

SkLearnDataLoaderSampleBatch module-attribute

SkLearnDataLoaderSampleBatch = Tuple[SkLearnDataLoaderItemBatch, SkLearnDataLoaderItemBatch]

Classes

DataLoader

DataLoader(dataset, *args, **kwargs)

Bases: ABC

Abstract base class for data loaders specific to a training plan's framework

Source code in fedbiomed/common/dataloader/_dataloader.py
@abstractmethod
def __init__(self, dataset: Any, *args, **kwargs) -> None:
    """Class constructor"""

PytorchDataLoader

PytorchDataLoader(dataset, *args, **kwargs)

Bases: DataLoader, DataLoader

Data loader class for PyTorch training plan

Source code in fedbiomed/common/dataloader/_dataloader.py
@abstractmethod
def __init__(self, dataset: Any, *args, **kwargs) -> None:
    """Class constructor"""

SkLearnDataLoader

SkLearnDataLoader(dataset, batch_size=1, shuffle=False, drop_last=False)

Bases: DataLoader

Data loader class for scikit-learn training plan

Assumes that fixing seed for reproducibility is handled globally in a calling class

Parameters:

Name Type Description Default
dataset Dataset

dataset object

required
batch_size int

batch size for each iteration

1
shuffle bool

True if shuffling before iteration

False
drop_last bool

whether to drop the last batch in case it does not fill the whole batch size

False

Raises:

Type Description
FedbiomedError

bad argument type or value

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def __init__(
    self,
    dataset: Dataset,
    batch_size: int = 1,
    shuffle: bool = False,
    drop_last: bool = False,
):
    """Class constructor

    Args:
        dataset: dataset object
        batch_size: batch size for each iteration
        shuffle: True if shuffling before iteration
        drop_last: whether to drop the last batch in case it does not fill the whole batch size

    Raises:
        FedbiomedError: bad argument type or value
    """
    # Note: fixing seed for reproducibility is handled globally in SkLearnDataManager

    # `dataset` type was already checked in SKLearnDataManager, but not kwargs

    if not isinstance(batch_size, int):
        raise FedbiomedError(
            f"{ErrorNumbers.FB632.value}: Bad type for `batch_size` argument, "
            f"expected `int` got {type(batch_size)}"
        )
    if batch_size <= 0:
        raise FedbiomedError(
            f"{ErrorNumbers.FB632.value}. Wrong value for `batch_size` argument, "
            f"expected a non-zero positive integer, got value {batch_size}."
        )
    if not isinstance(shuffle, bool):
        raise FedbiomedError(
            f"{ErrorNumbers.FB632.value}: Bad type for `shuffle` argument, "
            f"expected `bool` got {type(shuffle)}"
        )
    if not isinstance(drop_last, bool):
        raise FedbiomedError(
            f"{ErrorNumbers.FB632.value}: Bad type for `drop_last` argument, "
            f"expected `bool` got {type(drop_last)}"
        )

    self._dataset = dataset
    self._batch_size = batch_size
    self._shuffle = shuffle
    self._drop_last = drop_last

Attributes

dataset property
dataset

Returns the encapsulated dataset

This needs to be a property to harmonize the API with torch.DataLoader, enabling us to write generic code for DataLoaders.

Functions

batch_size
batch_size()

Returns the batch size

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def batch_size(self) -> int:
    """Returns the batch size"""
    return self._batch_size
drop_last
drop_last()

Returns the boolean drop_last attribute

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def drop_last(self) -> bool:
    """Returns the boolean drop_last attribute"""
    return self._drop_last
n_remainder_samples
n_remainder_samples()

Returns the remainder of the division between dataset length and batch size.

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def n_remainder_samples(self) -> int:
    """Returns the remainder of the division between dataset length and batch size."""
    return len(self._dataset) % self._batch_size
shuffle
shuffle()

Returns the boolean shuffle attribute

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py
def shuffle(self) -> bool:
    """Returns the boolean shuffle attribute"""
    return self._shuffle