Image Datasets
Introduction
ImageFolderDataset handles image classification tasks where images are organized in class-based folders. It provides seamless integration with standard computer vision workflows and automatic format conversion for federated learning.
Key Features
- Automatic class discovery from folder names
- Supports all common image formats (JPEG, PNG, BMP, TIFF)
- Auto-conversion to PyTorch tensors or NumPy arrays
- Built-in torchvision transformation support
Data Preparation
Proper data organization is essential for Fed-BioMed to recognize and load your image dataset. This section covers the required folder structure and steps to prepare your images for federated learning.
Folder Structure
Images must be organized with each class in its own subfolder:
root/
├── class_1/
│ ├── image1.jpg
│ └── ...
├── class_2/
│ ├── imageA.jpg
│ └── ...
└── class_3/
└── ...
No Images in Root Directory
All images must be placed inside class subdirectories. The root directory should contain only class folders, no image files. Placing images directly in the root will cause dataset loading to fail.
Supported formats: JPEG, PNG, BMP, TIFF, and other PIL-supported formats.
Examples
-
Animal Classification:
animals/ ├── cats/ ├── dogs/ └── birds/ -
Medical Images:
medical_images/ ├── normal/ ├── abnormal/ └── uncertain/
Preparation Steps
Follow these steps to prepare your image dataset for Fed-BioMed:
Create root directory
- This directory will contain only class subdirectories (no images)
Create class subdirectories
- Use descriptive, consistent names (e.g.,
normal,abnormal,cats,dogs) - Folder names become class labels and are case-sensitive
- Avoid spaces and special characters in folder names
- Classes are sorted alphabetically during loading
Place images in class folders
- All images must be inside their respective class folders
- Do NOT place any images directly in the root directory
- Ensure no nested subdirectories within class folders
Verify data quality
- Check for corrupted images
- Ensure consistent file formats within your dataset
- Verify image files have correct extensions
- Remove or relocate mislabeled images
Deployment
Node-Side
Register using the Fed-BioMed node CLI, see Deploying Datasets:
fedbiomed node dataset add
# 1. Select "images"
# 2. Path: /path/to/your/images/
# 3. Unique tags and description
Researcher-Side
Access image datasets through experiment configuration:
from fedbiomed.researcher.experiment import Experiment
# Select nodes with image classification datasets
experiment = Experiment(
tags=['#images', '#classification', '#medical'],
model=my_cnn_model,
training_plan_class=ImageClassificationPlan,
training_args=training_args
)
Transformations
Image datasets in Fed-BioMed support the full range of torchvision transformations for preprocessing and augmentation.
# Example: conservative augmentation for medical images
medical_transform = transforms.Compose([
transforms.Resize((512, 512)), # High resolution for medical detail
transforms.CenterCrop(512), # Preserve central anatomy
transforms.RandomRotation(5), # Minimal rotation
transforms.Normalize(mean=[0.5], std=[0.5])
])
Framework-Specific Considerations
For PyTorch Models:
- Use
transforms.Normalize()with appropriate mean/std values - Consider model input size requirements
For Scikit-learn Models:
- Images are automatically converted to NumPy arrays
- Consider flattening transforms for traditional ML algorithms
- Normalize pixel values appropriately (typically 0-1 or -1 to 1)
For comprehensive transformation documentation, see Applying Transformations.
Best Practices
Data Organization:
- Consistent structure: Maintain identical folder organization across all federated nodes
- Class naming: Use descriptive, consistent class names across nodes
- Quality control: Remove corrupted or mislabeled images before deployment
- Data quality standards: Establish and maintain consistent data quality criteria
Performance Optimization:
- Appropriate transforms: Choose transforms based on your specific model requirements
- Batch size tuning: Balance memory usage with training efficiency
- Input size considerations: Match input dimensions to your model architecture
Troubleshooting
Common Issues include:
Empty Dataset Error
- Verify folder structure has class subdirectories
- Check that image files are inside class folders, not in root
- Ensure image files have supported extensions
Class Mismatch
- Verify consistent class folder names across federated nodes
- Check for case sensitivity issues in folder names
Memory Issues
- Reduce batch size in training configuration
- Consider smaller input image dimensions
- Optimize transforms to reduce memory usage
Transform Errors
- Verify transforms are compatible with your training framework
- Ensure normalize values match your model requirements