This repository consists of:
- vision.datasets: Data loaders for popular vision datasets
- vision.transforms: Common image transformations such as random crop, rotations etc.
- [WIP] vision.models: Model definitions and Pre-trained models for popular models such as AlexNet, VGG, ResNet etc.
Binaries:
conda install torchvision -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumithFrom Source:
pip install -r requirements.txt
pip install .The following dataset loaders are available:
Datasets have the API:
- __getitem__
- __len__They all subclass from- torch.utils.data.DatasetHence, they can all be multi-threaded (python multiprocessing) using standard torch.utils.data.DataLoader.
For example:
torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)
In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args:
- transform- a function that takes in an image and returns a transformed version- common stuff like ToTensor,RandomCrop, etc. These can be composed together withtransforms.Compose(see transforms section below)
 
- common stuff like 
- target_transform- a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices.
This requires the COCO API to be installed
dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])
Example:
import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are', 
                        annFile = 'json annotation file', 
                        transform=transforms.ToTensor())
print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample
print("Image Size: ", img.size())
print(target)Output:
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.', 
u'A plane darts across a bright blue sky behind a mountain covered in snow', 
u'A plane leaves a contrail above the snowy mountain top.', 
u'A mountain that has a plane flying overheard in the distance.', 
u'A mountain view with a plume of smoke in the background']
dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])
dset.LSUN(db_path, classes='train', [transform, target_transform])
- db_path = root directory for the database files
- classes =
- 'train' - all categories, training set
- 'val' - all categories, validation set
- 'test' - all categories, test set
- ['bedroom_train', 'church_train', ...] : a list of categories to load
 
A generic data loader where the images are arranged in this way:
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
dset.ImageFolder(root="root folder path", [transform, target_transform])
It has the members:
- self.classes- The class names as a list
- self.class_to_idx- Corresponding class indices
- self.imgs- The list of (image path, class-index) tuples
This is simply implemented with an ImageFolder dataset.
The data is preprocessed as described here
Transforms are common image transforms.
They can be chained together using transforms.Compose
- ToTensor()- converts PIL Image to Tensor
- Normalize(mean, std)- normalizes the image given mean, std (for example: mean = [0.3, 1.2, 2.1])
- Scale(size, interpolation=Image.BILINEAR)- Scales the smaller image edge to the given size. Interpolation modes are options from PIL
- CenterCrop(size)- center-crops the image to the given size
- RandomCrop(size)- Random crops the image to the given size.
- RandomHorizontalFlip()- hflip the image with probability 0.5
- RandomSizedCrop(size, interpolation=Image.BILINEAR)- Random crop with size 0.08-1 and aspect ratio 3/4 - 4/3 (Inception-style)
One can compose several transforms together. For example.
transform = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                          std = [ 0.229, 0.224, 0.225 ]),
])