Questions tagged [pytorch-dataloader]

70 questions
5
votes
0 answers

Optimize pytorch data loader for reading small patches in full HD images

I'm training my neural network using PyTorch framework. The data is full HD images (1920x1080). But in each iteration, I just need to crop out a random 256x256 patch from these images. My network is relatively small (5 conv layers), and hence the…
Nagabhushan S N
  • 4,063
  • 5
  • 26
  • 54
4
votes
3 answers

Is it advisable to use the same torch Dataset class for training and predicting?

I have recently started using PyTorch and I liked it for its object-oriented style. However, I wonder what’s the best and advised workflow when predicting the model. I wanted to use a custom Dataset class I wrote and which I use for training and…
jakes
  • 1,636
  • 9
  • 33
3
votes
1 answer

pytorch Dataloader - if input data returns multiple training instances

Problem I have the following problem: I want to use pytorchs DataLoader (in a similar way like here) but my setup varies a bit: In my datafolder I have images (lets call them image_total of different street situations and I want to use cropped…
nckstr15
  • 70
  • 1
  • 6
3
votes
0 answers

How to train a Masked Language Model with a big text corpus(200GB) using PyTorch?

Recently I am training a masked language model with a big text corpus(200GB) using transformers. The training data is too big to fit into computer equiped with 512GB memory and V100(32GB)*8. Is it possible to find a elegant way to train model with…
3
votes
2 answers

How does PyTorch DataLoader interact with a PyTorch dataset to transform batches?

I'm creating a custom dataset for NLP-related tasks. In the PyTorch custom datast tutorial, we see that the __getitem__() method leaves room for a transform before it returns a sample: def __getitem__(self, idx): if torch.is_tensor(idx): …
rocksNwaves
  • 2,276
  • 14
  • 38
2
votes
0 answers

Displaying images loaded with pytorch dataloader

I am working with some lidar data images that I cannot post here due to a reputation restriction on posting images. However, when loading the same images using pytorch ImageFolder and Dataloader with the only transform being converting the images to…
CDWatson
  • 21
  • 1
2
votes
1 answer

Strange Cuda out of Memory behavior in Pytorch

Edit: SOLVED- Problem relied on the number of workers, lowered them, problem solved I am using a 24GB Titan RTX and I am using it for an image segmentation Unet with Pytorch, it is always throwing Cuda out of Memory at different batch sizes, plus I…
2
votes
0 answers

How to gather prediction result on TPU (Pytorch)?

I'm trying to fine-tune my bert-based QA model(PyTorch) with Tpu v3-8 provided by Kaggle. In the validation process I used a ParallelLoader to make predictions on 8 cores at the same time. But after that I don't know what should I do to gather all…
2
votes
1 answer

PyTorch DataLoader using Mongo DB

I would like to know if using a DataLoader connected to a MongoDB is a sensible thing to do and how this could be implemented. Background I have about 20 million documents sitting in a (local) MongoDB. Way more documents than fit in memory. I would…
pascal
  • 179
  • 1
  • 1
  • 13
2
votes
2 answers

How do I load the CelebA dataset on Google Colab, using torch vision, without running out of memory?

I am following a tutorial on DCGAN. Whenever I try to load the CelebA dataset, torchvision uses up all my run-time's memory(12GB) and the runtime crashes. Am looking for ways on how I can load and apply transformations to the dataset without hogging…
Kinyugo
  • 169
  • 8
2
votes
2 answers

PyTorch Dataset / Dataloader batching

I'm a little confused regarding the 'best practise' to implement a PyTorch data pipeline on time series data. I have a HD5 file which I read using a custom DataLoader. It seems that I should return the data samples as a (features,targets) tuple with…
David Waterworth
  • 1,453
  • 15
  • 31
1
vote
1 answer

PyTorch - Import dataset with images as labels

I have a dataset containing images as inputs and labels/targets as images as well. The structure in the folder is as follows: > DATASET/ > ---TRAIN/ > ------image_xx.png > ------label_xx.png > ---TEST/ > ------image_xx.png > ------label_xx.png I've…
1
vote
2 answers

What I missing here, using ImageFolder to get the full folder name as labels for MNIST-double dataset images?

I would like to use dataset.ImageFolder to create an Image Dataset. My current image directory structure looks like this: 1: In train images, I have subfolders which are my labels contain 00, 01, and so on. In each folder, images contain double…
1
vote
1 answer

sampler argument in DataLoader of Pytorch

While using Pytorch's DataLoader utility, in sampler what is the purpose of RandomIdentitySampler? And in RandomIdentitySampler there is an argument instances. Does instances depends upon number of workers? If there is are 4 workers then should…
user14
  • 27
  • 7
1
vote
1 answer

Get the length of every sentence before padding in torchtext bucketiterator

Is it possible to get the length of every sentence before padding in torchtext bucketiterator : train_loader = torchtext.legacy.data.BucketIterator(train_data, batch_size = 64, repeat=True, shuffle=True, sort_key = lambda x: len(x.text), sort=False,…
testaja
  • 33
  • 3
1
2 3 4 5