Dataloader

Classes that simplify imports from fedbiomed.common.dataloader

Attributes

PytorchDataLoaderItem `module-attribute`

PytorchDataLoaderItem = Optional[Union[Tensor, Dict[str, Tensor]]]

PytorchDataLoaderSample `module-attribute`

PytorchDataLoaderSample = Tuple[PytorchDataLoaderItem, PytorchDataLoaderItem]

SkLearnDataLoaderItemBatch `module-attribute`

SkLearnDataLoaderItemBatch = Optional[ndarray]

SkLearnDataLoaderSampleBatch `module-attribute`

SkLearnDataLoaderSampleBatch = Tuple[SkLearnDataLoaderItemBatch, SkLearnDataLoaderItemBatch]

Classes

DataLoader(dataset, *args, **kwargs)

Bases: ABC

Abstract base class for data loaders specific to a training plan's framework

Source code in fedbiomed/common/dataloader/_dataloader.py

@abstractmethod
def __init__(self, dataset: Any, *args, **kwargs) -> None:
    """Class constructor"""

PytorchDataLoader

PytorchDataLoader(dataset, *args, **kwargs)

Bases: DataLoader, DataLoader

Data loader class for PyTorch training plan

Source code in fedbiomed/common/dataloader/_dataloader.py

@abstractmethod
def __init__(self, dataset: Any, *args, **kwargs) -> None:
    """Class constructor"""

SkLearnDataLoader

SkLearnDataLoader(dataset, batch_size=1, shuffle=False, drop_last=False)

Bases: DataLoader

Data loader class for scikit-learn training plan

Assumes that fixing seed for reproducibility is handled globally in a calling class

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	dataset object	required
`batch_size`	`int`	batch size for each iteration	`1`
`shuffle`	`bool`	True if shuffling before iteration	`False`
`drop_last`	`bool`	whether to drop the last batch in case it does not fill the whole batch size	`False`

Raises:

Type	Description
`FedbiomedError`	bad argument type or value

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py

def __init__(
    self,
    dataset: Dataset,
    batch_size: int = 1,
    shuffle: bool = False,
    drop_last: bool = False,
):
    """Class constructor

    Args:
        dataset: dataset object
        batch_size: batch size for each iteration
        shuffle: True if shuffling before iteration
        drop_last: whether to drop the last batch in case it does not fill the whole batch size

    Raises:
        FedbiomedError: bad argument type or value
    """
    # Note: fixing seed for reproducibility is handled globally in SkLearnDataManager

    # `dataset` type was already checked in SKLearnDataManager, but not kwargs

    if not isinstance(batch_size, int):
        raise FedbiomedError(
            f"{ErrorNumbers.FB632.value}: Bad type for `batch_size` argument, "
            f"expected `int` got {type(batch_size)}"
        )
    if batch_size <= 0:
        raise FedbiomedError(
            f"{ErrorNumbers.FB632.value}. Wrong value for `batch_size` argument, "
            f"expected a non-zero positive integer, got value {batch_size}."
        )
    if not isinstance(shuffle, bool):
        raise FedbiomedError(
            f"{ErrorNumbers.FB632.value}: Bad type for `shuffle` argument, "
            f"expected `bool` got {type(shuffle)}"
        )
    if not isinstance(drop_last, bool):
        raise FedbiomedError(
            f"{ErrorNumbers.FB632.value}: Bad type for `drop_last` argument, "
            f"expected `bool` got {type(drop_last)}"
        )

    self._dataset = dataset
    self._batch_size = batch_size
    self._shuffle = shuffle
    self._drop_last = drop_last

Attributes

dataset `property`

dataset

Returns the encapsulated dataset

This needs to be a property to harmonize the API with torch.DataLoader, enabling us to write generic code for DataLoaders.

Functions

batch_size

batch_size()

Returns the batch size

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py

def batch_size(self) -> int:
    """Returns the batch size"""
    return self._batch_size

drop_last

drop_last()

Returns the boolean drop_last attribute

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py

def drop_last(self) -> bool:
    """Returns the boolean drop_last attribute"""
    return self._drop_last

n_remainder_samples

n_remainder_samples()

Returns the remainder of the division between dataset length and batch size.

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py

def n_remainder_samples(self) -> int:
    """Returns the remainder of the division between dataset length and batch size."""
    return len(self._dataset) % self._batch_size

shuffle

shuffle()

Returns the boolean shuffle attribute

Source code in fedbiomed/common/dataloader/_sklearn_dataloader.py

def shuffle(self) -> bool:
    """Returns the boolean shuffle attribute"""
    return self._shuffle

Dataloader

Attributes

PytorchDataLoaderItem module-attribute

PytorchDataLoaderSample module-attribute

SkLearnDataLoaderItemBatch module-attribute

SkLearnDataLoaderSampleBatch module-attribute

Classes

DataLoader

PytorchDataLoader

SkLearnDataLoader

Attributes

dataset property

Functions

batch_size

drop_last

n_remainder_samples

shuffle

PytorchDataLoaderItem `module-attribute`

PytorchDataLoaderSample `module-attribute`

SkLearnDataLoaderItemBatch `module-attribute`

SkLearnDataLoaderSampleBatch `module-attribute`

dataset `property`